Efficient Identification of Short Tandem Repeats via Context-Aware Motif Discovery and Ultra-Fast Sequence Alignment

This article has 0 evaluations Published on
Read the full article Related papers
This article on Sciety

Abstract

Tandem repeats (TRs) are highly polymorphic genomic elements, associated with diverse molecular traits and implicated in numerous human diseases. However, large-scale analysis of TRs has been limited by computational challenges, including motif recognition, detection in complex regions, and excessive computational cost. Here we present FastSTR, a computationally efficient tool for precise detection and characterization of TRs. FastSTR integrates a context-aware N-gram motif model with a segmented global alignment algorithm to enable accurate motif identification and boundary definition, even for repeat units up to 8 bp. Across 13 species, FastSTR achieved >90% recall and 99% precision, running several times faster than existing methods white outperforming them in both sensitivity and accuracy. Applied to the human genome, FastSTR uncovered previously unannotated HSATII elements, resolved population-specific TR demonstrate, and identified recurrent STR alterations in lung cancer. These results demonstrate FastSTR as a versatile framework for TR annotation and discovery, advancing studies of genome evolution, genetic diversity, and disease.

Related articles

Related articles are currently not available for this article.