Latest advancements in genomics involving individuals from different races and geographical locations has led to the identification of thousands of common as well as rare genetic variants and copy number variations (CNVs). These studies have surprisingly revealed that the majority of genetic variation is not present within the coding region but rather in the non-coding region of the genome, which is also termed as “Medical Genome”. This short review describes how mutations/variations within; regulatory sequences, architectural proteins and transcriptional regulators give rise to the aberrant gene expression profiles that drives cellular transformations and malignancies.
Unlike other patho-physiologies, understanding the role of mutations in cancer requires the knowledge of both germline as well as somatic mutations in the given cancer type. To this regard, several consortia have extensively catalogued both germline and somatic mutations using a variety of tools like genotyping arrays, exome-sequencing and whole-genome sequencing (WGS). To identify the functional mutations amongst the thousands of mutations discovered from such studies, the presence of mutation being in coding or non- coding region is crucial further, knowledge of chromatin signatures is necessary in case of non-coding mutations as, the functional mutations in the non-coding genome affect coding gene regulation. Towards this, there are ever-growing and evolving tools to assess such signatures including but not restricted to, DNase-hypersensitivity (DHS-seq), histone marks, transcription factor binding alterations (ChIP-seq), transcriptional changes (RNA- seq and GRO-seq) and involvement of chromatin looping (ChIA-PET and 3C-based tools) are routinely used. Further functional characterization by which these variants mediate their effects on target genes is done using several tools such as reporter assays in mouse or relevant cell lines and/or genetic manipulations of target sequences to decipher the mechanism in which the variant affects target gene expression. Systems approaches have also been undertaken on these mutations obtained from GWAS and they been linked to core metabolic processes which can help understand the underlying principles of cancer progression and design therapeutic approaches (1,2). The latest advances in sequencing techniques have allowed the genome-wide profiling of regulatory regions and associated protein cargos, accelerating the identification of functional mutations that was otherwise synonymous to “finding a needle in a haystack”.
Employment of such tools on several cancers has revealed that enhancers carry the highest density of variants. For example, in prostate cancer, less than 20% of the variants were present in promoters or the coding region (3) and 88% of the SNPs fell in putative enhancers (4). Enhancers are at the heart of regulated gene transcription; they mediate the transcriptional regulation of genes in a spacio-temporal manner to drive cell fate during development. Numerous enhancers are born, erased, activated, repressed during the cell fate choices to give rise to tightly controlled gene expression pattern in a committed lineage (5). It is believed that enhancers undergo similar transitions as oncogenes, during the cellular transformation from “normal” to “cancer-like”. Thus, perturbations in enhancer functions can have potentially harmful pleiotropic effects on the cell. Although coding variation is more deleterious as it can directly affect the protein in terms of its stability, composition, localization, interacting partners, function and activity, the variations within enhancers may affect the transcription rate of several genes in response to an environmental stimuli or developmental stage(s) resulting in the dysregulation in protein levels. Several reports now have strongly linked malignancies with mutation in enhancers (6).
Variations within the disease-causing enhancers could be of different types namely, an insertion or a deletion (indel), rearrangements or single nucleotide mutations. All these types of variations can lead to the loss of existing enhancer or de novo formation of an enhancer or can lead to a gain or loss of transcription factor binding sites resulting in alteration in enhancer activity (Figure 1). For example, it has been reported that individuals with single nucleotide deletions in the c-myc gene desert gains MYB binding site leading to the formation of a super enhancer to activate the TAL1 expression in T-cell acute lymphoblastic leukemia (T-ALL) (7). This study suggests, how an organism can gain a new enhancer during evolution to target the expression of important genes. Not only the deletion but also enhancer duplications aberrantly activate or represses target genes associated with monogenic diseases. For instance, duplication of an enhancer element upstream of the myc oncogene leads to the gain of a super enhancer (8). In gliomas, small somatic rearrangement hijack myb enhancer thus leading to the over expression of myb (9).
Structural mutations in enhancers: (i) IGFBP5 gene is regulated by its enhancer. When a small stretch in the enhancer is deleted which harboured the CTCF motif, the enhancer activity from the region is lost and gene is down-regulated. IGFBP5 being a tumour-suppressor gene leads to increased risk of cancer. (ii) An oncogene TAL1, has some basal expression in the wild type condition. Mutations lead to the gain of Myb motifs which facilitate the birth of a super-enhancer which up-regulates TAL1 expression and thus leading to increased susceptibility to cancer. (iii) Mutations in the enhancer of AR gene lead to the amplification of the region harbouring the enhancer which leads to increased expression of androgen receptor leading to increased prostate cancer risk. (iv) An enhancer is present physically far from the TERT gene. Rearrangement places the enhancer in close physical proximity to the TERT oncogene leading to its activation.
The third kind of variation, single nucleotide polymorphisms (SNPs) within the enhancers is the most interesting but least understood. The risk SNP often affects the transcription factor binding in the core region of an enhancer which is where the eRNA transcription machinery also assembles. Thus lack or gain of TF binding on an enhancer, can not only affect the polII loading but also the rate at which eRNA is transcribed. Together, they affect the magnitude of target gene activation associated with a disease pathogenesis (Figure 2). For example, the gene desert region beside myc is a hotspot for genetic variation associated with several cancers in different human races and many GWAS have identified SNPs in this region which ultimately increases the expression of myc or numerous lncRNAs such as PCAT1, PRNCR1, CCAT1 and PVT1 within the same TAD as myc. The lncRNA PCAT1 in 8q24 is upregulated by rs7463708 which gains the binding for ONECUT2 transcription factor leading to activating metastatic genes in trans (10). Similarly, risk SNP rs11672691 allows for the switching from PCAT19 lncRNA’s short to long isoform which increases the prostate cancer susceptibility (11). Similarly, there are plenty of examples now to conclude that common genetic variation within the enhancer constitutes a significant portion of functional genomics that is linked to malignancies (Table 1).
Single nucleotide polymorphism in enhancers: (i) A Single nucleotide polymorphism (SNP) leads to the birth of an enhancer. In the non-risk allele, the target gene is not regulated by any enhancer, but upon the gain of the risk allele of the SNP, an enhancer is born which targets the gene leading to its higher expression than normal levels. When such a gene is an oncogene, its upregulation might prove to be tumorigenic. (ii) An enhancer which targets its cognate gene possesses a SNP, the risk allele of which leads to the death of the enhancer and heterochromatinization of the enhancer region. This leads to the down-regulation of the gene and when this gene is a tumour-suppressor gene, it increases the susceptibility to cancer. (iii) An enhancer which targets its cognate gene, if possesses a risk SNP which leads to the gain of binding of a transcription activator leading to an up-regulation of the oncogene hence, susceptibility to cancer. (iv) An enhancer which targets its cognate gene, if possesses a risk SNP which leads to the gain of a different transcription factor than the one which was binding when the wild-type allele of the SNP was present. The new transcription factor is more potent in its activation of the enhancer and the target oncogene gets upregulated leading to cancer susceptibility.
Type of Mutation | Gene | Cancer type | Reference |
---|---|---|---|
Gain of super enhancer | TAL1 | T-cell Acute Lymphoblastic Leukemia | 18 |
rs339331 | RFX6 | Prostate | 19 |
rs8072254/rs1859961 | SOX9 | Prostate | 20 |
rs67491583 | MYC | Colorectal | 21 |
Enhancer Translocation | MYB-QK1 | Glioma | 9 |
rs7463708 | PCAT1 lncRNA | Prostate | 10 |
Insulator deletions | TAL1 | Lymphoblastic Leukemia | 12 |
rs965513 | FOXE1 and PTCSC2 | Thyroid | 22 |
rs554219, rs78540526, rs75915166 | CCND1 | Breast | 23 |
rs6983267 | MYC | Colorectal | 24 |
Enhancer deletion | MLH1 | Colorectal | 7 |
Enhancer amplification | AR | Castrate-resistant prostate cancer | 25 |
Enhancer invasion | MYCN-target genes | Neuroblastoma | 26 |
Enhancer amplification, translocation | MYC | Pediatric neuroblastomas | 27 |
Super-enhancer translocation | MYB | Cystic carcinoma | 28 |
rs1167291 | PCAT19 lncRNA | Prostate | 11 |
rs2981578, rs35054928, rs45631563 | FGFR2 | Breast | 13 |
rs920778 | HOTAIR | Esophageal squamous cell carcinoma | 29 |
rs12203592 | IRF4 | Acute Lymphocytic Leukemia | 30 |
Super-enhancer rearrangements | TERT | Pheochromocytomas | 31 |
The genome is compartmentalized into megabase-sized bins known as topologically associating domains (TADs). Generally, TADs exhibit two functional features namely, high intra-TAD interactions and low inter-TAD interactions. The low inter-TAD interactions are the result of the presence of a strong physical barrier between the TADs known as boundaries or insulators. During evolution, it has been seen that break points around syntenic regions are enriched at boundaries suggesting that the break points within TADs are strongly selected against, indicating the self-regulated or modular nature of TADs. Experimental deletions or mutations at boundaries merging two neighboring TADs results in the dysregulation of genes in the affected TADs. Likewise, the mutations within and around insulator regions are enriched in tumors resulting from alterations in TADs and ultimately gene dysregulation (12). Most TAD boundaries have two-fold higher binding of the transcription factor CTCF and CTCF knockdown severely affects the functions of boundaries. Thus, it is not surprising that a loss or gain of CTCF binding site(s) at TADs affects the genes in the TADs as evident in some cancers (Figure 3) (13).
Mutations and insulator: An insulator separates two TADs; one active and the other repressed, harbouring active and repressed genes respectively. When there is a mutation in the insulator and/or a transcription factor (like CTCF) which binds to the insulator such that the insulator function is lost, the adjacent TADs merge and all the genes show a similar pattern of expression, active expression in this case.
A precise chromatin structure is the key to regulated transcription. Just like the mutations in boundary elements cause activation or repression of key tumor suppressors or activators, mutations in architectural proteins contribute to genome-wide modifications in chromatin arrangement affecting regulation of several genes. For example, mutations in the STAG2 subunit of the cohesin complex, which along with CTCF establishes the insulated neighborhood of TADs, has been reported in AML, Ewing-Sarcoma, bladder, melanoma, cervical and glioblastoma malignancies (14). Cohesin also plays a role in enhancer:promoter looping and thus mutations in these proteins could lead to pleotropic effects. Since STAG2 is located on ChrX, a single mutation could lead to the loss of protein. Similarly, CDK8, CYCLIN C, MED12, MED13 and MED23 subunits of mediator complex are also frequently mutated in several cancers (15, 16).
Cancer cells employ wide arrays of mechanisms to alter the highly coordinated gene expression in normal cells which leads to cancerogenesis. Thus, not surprisingly, transcription factors, coactivators, corepressors and chromatin modifiers are mutated the most in cancers (12). Since chromatin regulators are highly cell-type specific, mutations in them give rise to specific cancers for example, mutations in nuclear receptors are specific to solid tumors. On the other hand, mutations in non-cell type specific or global regulators is a common theme in most tumors, such as mutations in zinc fingers of several transcription factors, particularly of Kruppel associated box (KRAB) domain containing factors are widespread across several cancers. Some reported mutations in transcriptional regulators in various cancer types are mentioned in Table 2.
Gene | Cancer | Reference |
---|---|---|
IRF4 | Chronic lymphocytic leukemia | 32 |
TET2 | Myeloid Cancer | 33 |
EZH2 | Myelodysplastic syndromes | 34 |
DNMT3A | Acute myeloid leukemia | 35 |
SMARCE1 | Familial multiple spinal meningiomas | 36 |
ARID1B | Childhood neuroblastoma | 37 |
BAF180 | Breast Cancer | 38 |
PBRM1 | Renal cell carcinoma | 39 |
ATRX | Glioblastoma | 40 |
DOT1 | Leukemia | 41 |
MLL2 and MLL3 | Medulloblastoma | 42 |
NUP98-NSD1 | Acute myeloid leukemia | 43 |
RUNX1 | Acute myeloid leukemia | 44 |
JMJD3 | Brainstem glioma | 45 |
BRD3 and BRD4 | Prostate Cancer | 46 |
CREBBP | Oesophageal Cancer | 47 |
EP300 | Acute lymphoblastic leukemia | 48 |
HDAC4 | Acute lymphoblastic leukemia | 49 |
SETD2 | Leukemia | 50 |
MLL1 | Acute lymphoblastic leukemia | 51 |
CTCF | Acute myeloid leukemia | 52 |
STAG2 | Several cancers | 53 |
RAD21 | Myeloid neoplasms | 54 |
SPDEF | Gastric cancer | 55 |
MYC | Several cancers | 56 |
The most fundamental aspects of functional enhancers and their associated eRNAs is that they are highly cell- and tissue-type specific. Organismal development has also been largely governed by these spatially and temporally regulated enhancers thus clearly bringing corresponding eRNAs to the fore of these aspects. Though mutations in enhancers affect corresponding eRNA expression, direct link between cancer and mutations in eRNAs has not been established yet. eRNAs themselves can serve many roles or just the act of their transcription can also be important, for instance, divergent transcription from super enhancer in the gene body of an oncogene can create stalling of PolII whereby these stalled polymerases create ssDNA that are recognized by Activation-Induced Cytidine Deaminase (AID) which in turn can cause the DSB leading to the translocations (17). There is a further need of locus based as well as genome- wide studies to fully appreciate the breadth and depth of their functions.
At present, hundreds of thousands of regulatory elements have been identified by ENCODE, Roadmap, FANTOM5 and a very recent project, Blueprint/IHEC. However, large scale measures to unravel the functionality of these genomic elements in different tissues and cancers are still lacking. Once these enhancers and their functions are identified, finding the causative mutations can be accelerated which will then pave the ways for correctional enhanceropathies.
The authors apologize to all authors whose work could not be cited due to space constraints. The authors are grateful to all the lab members for the fruitful discussions. KW acknowledges the SPM fellowship from Council of Scientific and Industrial Research (CSIR), India. DN acknowledges the financial support from the Wellcome Trust/India Alliance as well as the intramural funds from NCBS-TIFR. DN acknowledges support from EMBO global investigator program. Authors declare no competing interests.
CNVs
Copy number variations
Whole-genome sequencing
DNase hypersensitivity
Chromatin immunoprecipitation
Global run-on
Chromatin Interaction Analysis by Paired-End Tag
Chromatin conformation capture
Genome-wide association study
Single nucleotide polymorphism
Transcription factor
Topologically associating domain
Double strand break