Resumen
Tandem repeats (TRs) are important components of eukaryotic genomes; they have both structural and functional roles: (i) they form essential chromosome structures such as centromeres and telomeres; (ii) they modify chromatin structure and affect transcription, resulting in altered gene expression and protein abundance. There are established links between variations in TRs and incompatibilities between species, evolutionary development, chromosome mis-segregation, aging, cancer outcomes and different diseases. Given the importance of TRs, it seemed essential to develop an efficient, sensitive and automated application for the identification of all kinds of TRs in various genomic sequences. Here, we present our new GRM application for identifying TRs, which is designed to overcome all the limitations of the currently existing algorithms. Our GRM algorithm provides a straightforward identification of TRs using the frequency domain but avoiding the mapping of the symbolic DNA sequence into numerical sequence, and using key string matching, but avoiding the statistical methods of locally optimizing individual key strings. Using the GRM application, we analyzed human, chimpanzee and mouse chromosome 19 genome sequences (RefSeqs), and showed that our application was very fast, efficient and simple, with a powerful graphical user interface. It can identify all types of TRs, from the smallest (2 bp) to the very large, as large as tens of kilobasepairs. It does not require any prior knowledge of sequence structure and does not require any user-defined parameters or thresholds. In this way, it ensures that a full spectrum of TRs can be detected in just one step. Furthermore, it is robust to all types of mutations in repeat copies and can identify TRs with various complexities in the sequence pattern. From this perspective, we can conclude that the GRM application is an efficient, sensitive and automated method for the identification of all kinds of TRs.