Retrotransposons are DNA sequences that have the ability to move around the genome. They rely on the reverse transcription of RNA to DNA to move about the genome. In mammals retrotransposons fall into three classes: LTR retrotransposons (retrovirus-like elements), LINEs, and SINEs.
LTR TEs are in the minority and resemble RNA viruses. They share their basic structure with retroviruses. In the central coding region, the pol gene encodes the transcriptase/integrase, the gag gene encodes structural proteins involved in transposition, and the env gene, when present, encodes envelope and coat proteins. In insects, when the env gene is present the transposons can move between cells. The central region is flanked by long terminal repeats of several hundred base pairs, and have short inverted repeats at the termini. They can form virus-like particles in cells and their sequences have homology to retroviral genes. When they move, a copy is made at the new location.
LINEs (long interspersed nuclear elements) encode their own reverse transcriptase and resemble polyadenylated RNA. In humans they form three families: LINE1, LINE2, and LINE3. Line1 is the only one which remains active. Also known as Kpn repeats, LINE1 is present >600 000 times, occupying around 17% of the genome and being distributed through the AT-rich regions of euchromatin. The repeats are heterogeneous and average 800 bp; the repeat arrays average 6.1 kb. The sequences are truncated at the 5′ ends and include 2 ORFs, encoding RNA binding protein p40 and an RT/endonuclease, with an RNA pol II promoter in the 5′ UTR. LINEs move by being RNA transcribed, moved to the cytoplasm, and translated. The RNA and proteins assemble and the complex is moved to the nucleus where the RT acts as an endonuclease and cuts at TTTT/A and inserts the LINE into the DNA in an AT-rich, gene-poor region. This minimises the mutational load on the LINE. The nick left by the insertion acts as the 3′ OH primer.
SINEs (short interspersed nuclear elements) are parasites on LINEs. They don’t encode RT and have an internal RNA pol III promoter. They are flanked by small repeats and have a poly-A/poly-T tail at the 3′ end. The RT may act on the flanking repeats. They do not encode proteins. The only active SINE is the Alu sequence that occurs in primates, including humans. Alu elements are around 280-300 base pairs and comprise around 10% of the genome. It usually occurs in the form of dimers. As it does not encode its own RT, it uses that of L1. After the RT nicks the DNA, Alu’s poly-A tail binds to the TTTT sequence and thus primes itself for reverse transcription. Afterwards the L1 RT nicks the opposite strand and allows Alu to integrate.
Most of the three kinds of retrotransposons in mammals consist mainly of inactive members. The active transposons of two these types – LINEs and SINEs – use the same transpositional equipment in spite of the differences of their structures. Though few retrotransposons are actively mobile, those that are occupy large portions of the genome.