H-DNA as a Potential cis-Element Regulating Gene Expression
Subhankar Chakraborty
DNA triplexes are formed when a polypurine segment of duplex DNA is recognized and bound by a single stranded oligonucleotide lodging in the major groove of the duplex. Such a single stranded oligonucleotide is termed as a triplex-forming oligonucleotide (TFO) while the sequence with which it interacts is called the triplex-targeting sequence (TTS). The TFO can be either DNA, RNA or any oligonucleotide with modifications in its nucleobases or the phosphoribose backbone. Based on the orientation of the TFO with respect to the central polypurine tract, two types of orientation can be discerned: i) parallel form characterized by formation of Hoogsteen bonds between the TFO which is typically pyrimidine rich, and the polypurine strand of the duplex, and ii) antiparallel form wherein the TFO is polypurine rich and forms reverse-Hoogsteen bonds with the polypurine strand.
An analysis of the human genome showed that TTS regions (sequences that are expected to form stable triplexes under physiological conditions)are vastly over-represented in the genome, far more than can be expected to occur purely by chance. Significantly, the greatest number of these sequences occur just upstream of regulatory regions (mostly within 100 nucleotides 5′ of the promoter region ) of genes. Moreover, genes with a TTS upstream of their promoters have important physiological functions, often coding for transcription factors or related proteins. However, mapping of the promoter region upto 200 bases upstream of the transcription start site in those genes which contained potential triplex DNA forming sequences did not reveal any consensus sequence for binding of transcription factors. Another possibility was that the TTS located in the promoter region of a gene was involved in triplex formation with a complimentary sequence in the intron region of another gene thereby achieving negative regulation of gene expression. However, a search failed to reveal any relationship between genes with a TTS in their intron region with a complimentary TFO in the intron of another gene. But the gross over-abundance of TTS sequences is too obvious to ignore as being non-functional. Investigation into the physical differences between TTS and duplex forming DNA regions led to the observation that triplex forming segments of the DNA are significantly more curved and rigid than their duplex forming counterparts. While flexibility of the DNA is a desirable feature for binding regulatory proteins, increased rigidity allows it to act as a spacer element which isolates deformable segments of the DNA to specific regions of the genome. Further, the large curvature of the TTS regions could potentially allow these regions to help position proteins in 3-dimensional space ,in turn establishing critical protein-protein interactions. It has been pointed out that the differential properties of TTS regions can influence the rotational and translational phasing of nucleosomes. Many authors point out that TTS regions are preferentially left out of nucleosome formation. All these suggest that triplex forming regions at the promoter region can modulate gene expression, which may sometimes even be unrelated to their ability to form actual triplex structures.
Example 1: Role of H-DNA formation in transcription of the heat shock protein gene hsp26 in Drosophila
Polypurine and polypyrimidine stretches abound in a non-random manner in the genomes of several organisms. One example of such a sequence is (CT)n.(GA)n which is seen in the promoter region of many genes, at origins of replication and recombination hotspots. Deletions or mutations in these homopurine-homopyrimidine stretches situated in the promoter region have been shown to decrease gene expression, thereby attesting to their functional importance. (CT)n.(GA)n sequences possess mirror symmetry and thus are capable of forming H-DNA structures in vitro.
The H-DNA structure comprises a right handed polypurine-polypyrimidine triple helix with a looped out purine single strand. The intramolecular form of the triple helix is formed by Hoogsteen base pairing of a polypyrimidine strand, positioned in the major groove with the parallel positioned polypurine strand of a polypurine-polypyrimidine duplex. Hoogsteen base pairing of cytosine and guanine bases requires protonation of the N3 of the cytosine, and hence this structure is favored by an acidic pH in vitro.
Evidence suggesting existence of such a triplex structure in vivo has emerged from studies that used monoclonal antibodies specific for the triplex form of DNA. Using these antibodies, H-DNA has been localized to regions located on metaphase chromosomes and interphase nuclei of mouse myeloma cells. Another method used to investigate their presence in vivo is by using oligonucleotides hybridizing to single-stranded DNA. By this technique, triplex DNA has been demonstrated in the centromeric region of interphase chromosomes. Furthermore, a growing number of proteins are being discovered that bind to triplex DNA with high affinity, albeit most of them also bind duplex B-DNA.
There is accumulating experimental evidence that H-DNA plays an important role as a regulator of gene expression. For instance,S1 nuclease assays that target single-stranded stretches of DNA have mapped potential H-DNA forming sites within the promoter region of several genes including the Hmga2 gene in mouse and humans, histone genes in humans as well as Drosophila, the genes encoding heat shock genes in Drosophila and a sodium channel subunit in humans. Triplex DNA can have both an inhibitory and stimulatory effect on transcription. For instance, potential triplex forming sequences when placed upstream of the beta-galactosidase promoter increased its expression, while decreasing the expression of a similarly placed herpes simplex thymidine kinase.
Although the exact mechanism by which triplex formation regulates gene expression is yet unclear, several possibilities have been suggested. Polypurine polypyrimidine stretches have been shown to modulate chromatin structure, by “absorbing” the effect of negative supercoiling generated by RNA polymerase, thus facilitating unwinding of the chromatin comprising the transcription bubble. Indirect evidence of the positive role played by H-DNA in enhancing transcription comes from the association of this form of DNA with DNAseI hypersensitivity sites ( nucleosome free regions of the chromatin that encompass important regulatory regions). Further, H-DNA has been demonstrated to alter histone-DNA interactions, the translational positioning of nucleosomes and induce kinks in adjacent B-DNA forming helices which are hypothesized to hinder nucleosome assembly.
Another route by which H-DNA forming DNA elements could influence chromatin structure is by binding to specific trans-activating factors or altering the chromatin architecture to make the binding of these factors more or less favorable. The promoter region of the hsp26 gene in Drosophila has two (CT)n.(GA)n regions, and their deletion or mutation drastically reduces transcription of the gene. Furthermore, bounding the (CT)n regions are two heat shock elements (HSE) which comprise DH sites. a trans-acting factor called GAGA factor, binds to the polypurine stretch on either side of the hsp26 promoter. in vitro experiments have shown that GAGA factor plays an important role in nucleosome positioning. Chemical modification analysis of the proximal promoter region ( from -135 to -85 bp relative to the start site of hsp 26 gene transcription ) reveals an S1 nuclease sensitive site that is capable of adopting the canonical H-DNA structure with a dY-dR-dY triple helix and a single stranded homopurine region. A C?T transition at position -107 (which is a part of the (CT)n.(GA)n triplex forming region) resulted in disruption of the H-DNA structure at pH 5.0 as determined by UV furthermore, although there was a single stranded region even in the mutant oligonucleotide, it was much shorter and less stable than the non-mutant wild type sequence. It was not possible to demonstrate the formation of H-DNA in vivo although indirect evidence in the form of pausing of RNA polymerase at the hsp promoter of the mutant construct was observed. In order to dissect the role of the triplex forming region in the promoter of hsp 26 gene in regulating gene expression separate from the effect of the GAGA transcription factor, mutant constructs where the hsp 26 promoter bearing mutation in its (CT)n repeats (GAGA binding site) drives downstream expression of a lac Z gene were designed and their expression levels compared to that of a control. The results show that H-DNA formation alone is unable to maintain transcription of the downstream reporter lac Z gene in the absence of GAGA binding. Moreover, triplex formation also appeared to have a repressive effect on gene expression in mutants completely lacking a GAGA binding sequence. It has been shown previously that the GAGA factor binds to a (CT)n.(GA)n triple helical DNA of the dY-dR-dY type, possibly stabilizing an intramolecular triple helix at the hsp 26 promoter. It appears from the study on the hsp 26 promoter that formation of a H-DNA structure alone is insufficient to drive the expression of a gene. This suggests that the role of homopurine-homopyrimidine structures in modulating gene expression by assuming unusual DNA structures might be less important than their role as binding sites for triplex-specific transcription activators.
Example2: Polypurine-polypyrimidine tracts in the Hmga2 minimal promoter and transcription modulation of growth promoting genes
HMGA2 is a transcription factor that belongs to the HMGA family of non-histone chromosomal proteins. The two other members of this family, HMGA1a and 1b, collectively called HMGA1 as they arise by alternate splicing of mRNA coded for by the same gene are key components of multiprotein transcription complexes that modulate the expression of a number of genes including tumor necrosis factor ,E-selectin, and the insulin receptor. Hmga2 also has a similar role in transcription modulation. HMGA proteins are hardly detectable in normal adult tissues, while being expressed at high levels in several tumor cell lines and benign mesenchymal tumors like lipoma and uterine leiomyomas. The dysregulation of these proteins occurs by chromosomal rearrangements, making HMGA proteins perhaps the most frequently rearranged genes in human neoplasms. HMGA knock-out mice display a pygmy phenotype, and the protein has been shown to be essential during embryogenesis, especially for the development of the adipose tissue .
An analysis of the promoter region of Hmga2 gene promoter is the presence of a 60 bp stretch of polypyrimidine-polypurines (ppy-ppu) located a few base pairs upstream of the transcription start site. This segment is homologous to a functionally significant region present in the promoter of the Epidermal growth factor receptor (EGF-R) gene, as well as in the 5′ flanking regions of other growth related genes including c-myc, insulin receptor, androgen receptor, c-src, c-Ki-ras, transforming growth factor 3 and the platelet-derived growth factor A-region. The common feature of all the abovementioned genes is that all of them have TATA-less promoters, and ppy-ppu stretches have been discovered in the proximal promoter regions . Analysis of truncation mutants of the Hmga2 promoter has revealed that a ppy-ppu stretch comprising 60 bp (from -84 to -25 relative the the transcription start site), contains multiple binding sites for the Sp1 transcription factor and is essential for transcription. As previously mentioned, ppy-ppu stretches are often components of regulatory regions of genes, and form triplex structures in vitro as evidenced by sensitivity to single strand specific nucleases S1 and P1. S1 nuclease analysis revealed that the ppy-ppu region upstream of the Hmga2 promoter also adopts non-B DNA structure. Further, supercoiling is essential to form the triplex structure, as evidenced by the observation that DNA linearized with HindIII cutting in the polylinker region of the cloning vector prior to S1 nuclease treatment resulted in disappearance of the band corresponding to the single stranded conformation. Fine mapping of the S1 nuclease sensitive region (called NSE1 for nuclease sensitive element 1) extending from -9 to -84 bp (relative to the transcription start site) revealed that on the non-coding strand, S1 nuclease cleavage sites occurred at regularly spaced intervals; however, on the coding strand, the distribution was more irregular. Examination of possible H-DNA structure in vitro by the oligonucleotide band shift assay using a labeled TFO oligonucleotide sequence (derived from -60 to -38 region of the Hmga2 promoter) to retard the migration of a potential triplex forming duplex oligonucleotide ( corresponding to -84 to -62 region of the ppu-ppy tract) on polyacrylamide gel showed that the ppu-ppy tract in the NSE1 region is capable of forming triplexes at least in vitro. Experimental evidence shows the ppu-ppy region of Hmga2 can adopt two conformations : a single stranded form in which it binds to PTB (polypyrimidine tract binding protein), a protein that interacts with polypyrimidine tracts present in pre-mRNAs and proposed to be involved in the regulation of several alternately spliced genes and internal ribosome entry; and a canonical duplex B-form that binds to the transcription factor Sp1. The case of Hmga2 is similar to that of the the c-myc gene. It has been shown that the CT element of the c-myc gene, which provides a binding site for Sp1 when the DNA is double stranded, also binds another protein hnRNP K when it adopts a single stranded conformation. hnRNP K activates gene transcription through interaction with several cis-elements in vivo. Thus, the CT element of the c-myc gene exists in two mutually exclusive conformations that allow binding to two conformation specific transcription factors.
Based on this analogy and experimental evidence, a model has been proposed for the working of the Hmga2 proximal promoter. According to this model, the stress of supercoiling present in the chromatin structure causes the ppu-ppy tract in the HMGA1 promoter to melt, adopting one of the possible non-B DNA conformations. PTB binds to the single stranded binding site on the coding strand. PTB can then either function as an architectural element to stabilize the non-B DNA open conformation allowing the entry of RNA polymerase into the template, or it may actually subserve the role of a transcription factor (like hnRNP K). Sp1 binds to the same ppu-ppy region when it is in a duplex B-DNA conformation. Thus, the final outcome on the transcription of the Hmga2 gene would be determined by the relative balance between the activity of the two trans-acting factors. Parameters that affect the melting of the DNA, like attachment of chromatin to the nuclear matrix, activity of topoisomerases and the presence of nucleosomes (which severely interfere with the formation of non-B DNA structures ) can dramatically influence the transition of the promoter region between canonical B-DNA and non-B DNA conformations. The use of chromatinized DNA templates to evaluate the function of non-B DNA structure of the Hmga2 gene might be an experimental scenario that more closely resembles the in vivo nuclear milieu.
Example 3: Regulation of c-myc gene by Ribonucleoprotein and Protein factors binding to the H-DNA forming Nuclease Sensitive Elements (NSE)
c-myc is a protooncogene. It prevents differentiation of mouse erythroleukemia cells and when linked to the immunoglobulin enhancer element to create a transgene in mice, a B-cell lymphoma results. Further, silencing its expression using c-myc specific antisense oligonucleotides results in growth inhibition and induction of differentiation in the human promyelocytic leukemia cell line HL-60, thereby emphasizing its role in promotion of cell proliferation and induction of differentiation. It was observed that deletion of the region between -353 and -101 relative to the transcription start site reduced c-myc expression particularly from the P1 promoter. Moreover, there is a single strand sensitive site centered around position -125 of the gene. By making small deletions around this nuclease sensitive site, Davis et al. Studied its effect on single strand specific nuclease sensitivity and transcription from the c-myc promoter using a CAT (Chloramphenicol acetyltransferase) reporter assay. They observed that a 6, 11 or 42 bp deletion resulted in full, weak and almost completely lost S1 nuclease sensitivity, while the expression of CAT reporter was decreased to 53%,26% and 31% of the wild type c-myc construct. This suggested that the nuclease sensitive sites acted as positive cis-regulating elements. Assay for nuclear proteins binding to this DNA element revealed that one of the factors was a ribonucleoprotein ( hnRNP K ) while the other was a protein. Both these trans-acting factors bound to the NSE of c-myc and their levels in the cell correlated directly with the levels of c-myc, further strengthening the conclusion that these are positive cis-regulatory elements.
phenanthroline/Cu2+ DNA protection experiments to determine the region of the DNA bound by these proteins revealed that the RNP bind weakly to four nucleotides of the G-rich strand, and to 26 nucleotides on the C-rich strand. Further, it also formed a dimer which protected a longer segment of the DNA. At the 5′ boundary of the binding site, there was a hypersensitive site which was at the same position on both the G and C rich strands. In contrast the other NSE binding protein differentially protected the entire C strand from cleavage; it also produced a hypersensitive site that coincided with that produced by the RNP. Binding competition assay using mutant oligonucleotides that compete with the wild type c-myc NSE for binding to RNP or the other major NSE binding protein allowed determination of the protein binding sites on the c-myc NSE. The consensus binding site consists of two sets of ACCCT sequences (termed the AC box) separated by a four nucleotide spacer. Further, mutation in any of the two AC boxes was found to disrupt protein binding. Besides this, the c-myc NSE has been shown to form H-DNA structure in vitro. As the levels of the two trans-acting proteins correlated with active transcription of the c-myc gene, it is proposed that RNP and the other major NSE binding protein , bind to and either induce or stabilize the open H-DNA conformer.
The existence of H-DNA in vivo is still a matter of debate, although repeats of TC. AG upto 18 bp long occur once every 140-150 kb in the human genome. Also, sequences capable of forming H-DNA exist in regions that regulate transcription, replication and recombination. H-DNA formation relaxes negative supercoiling, thus raising the possibility that it could serve as a buffer for the chromatin when faced with surges of negative supercoiling (as during transcription).So, the DNA may activate H-DNA forming mechanisms during transcription. further, as noted, specific physical characteristics of this DNA such as the kinks and the single stranded segments can affect protein-DNA interactions. A strong bias towards stability of the H-y3 conformation also and the non-interconvertible nature of the two conformers further suggests that the same DNA sequence , depending on transient conditions in the cell, adopt either of the two forms. Thus, the use of H-DNA as a cis element may be another tool in the repertoire of the cell in regulating the complex transcription machinery.