Instability and polymorphism at several CAG/CTG trinucleotide repeat loci have been associated with human genetic disorders. In an attempt to identify novel sites that may be possible loci for expansion of CAG/CTG repeats, we searched all human sequences in the EMBL nucleotide sequence database for (CAG)5 and (CTG)5 repeats. We have identified 121 human DNA sequences of known and unknown functions that contain stretches of five or more CAG or CTG repeats. Many repeat stretches were interrupted by variant triplets, a significant number of which differ from the repeat triplet only by a single base, suggesting that these evolved from the parent triplet by point mutations. A large number of human transcription factor genes were found to contain CAG repeats within their coding sequences. Analysis of the EMBL transcription factors database showed that many transcription factor genes of other eukaryotes, including genes involved inDrosophila embryo development, possess these repeats. Interestingly, CAG repeats are absent from prokaryotic transcription factors. Different sequence entries for the human TATA box binding protein showed a polymorphism in the length of the CAG repeat in this gene, suggesting that loci other than those already known to be associated with genetic diseases may be possible sites for repeat instability related disorders. On the basis of our findings in this database analysis, we propose a role for CAG repeats as cisacting regulatory elements involved in fine-tuning gene expression.
Volume 45, 2020
Continuous Article Publishing mode
Click here for Editorial Note on CAP Mode