Characterization of the substitution hotspots in SARS-CoV-2 genome using BioAider and detection of a SR-rich region in N protein providing further evidence of its animal origin
Abstract
The novel human coronavirus (SARS-CoV-2) causes the coronavirus disease 2019 (COVID-19) pandemic worldwide. The increasing sequencing data have shown abundant single nucleotide variations in SARS-CoV-2 genome. However, it is difficult to quickly analyze genomic variation and screen key mutations of SARS-CoV-2. In this study, we developed a visual program, named BioAider, for quick and convenient sequence annotation and mutation analysis on multiple genome-sequencing data. Using BioAider, we conducted a comprehensive genome variation analysis on 3,240 sequences of SARS-CoV-2 genome. Herein, we detected 14 substitution hotspots within SARS-CoV-2 genome, including 10 non-synonymous and 4 synonymous ones. Among these hotspots, NSP13-Y541C was predicted to be a crucial substitution which might affect the unwinding activity of NSP13, a key protein for viral replication. Besides, we also found 3 groups of potentially linked substitution hotspots which were worth further study. In particular, we discovered a SR-rich region (aa 184-204) on the N protein of SARS-CoV-2 distinct from SARS-CoV, indicating more complex replication mechanism and unique N-M interaction of SARS-CoV-2. Interestingly, the quantity of SRXX repeat fragments in the SR-rich region well reflected the evolutionary relationship among SARS-CoV-2 and SARS-CoV-2 related animal coronaviruses, providing further evidence of its animal origin. Overall, we developed an efficient tool for rapid identification of mutations, identified substitution hotspots in SARS-CoV-2 genomes, and detected a distinctive polymorphism SR-rich region in N protein. This tool and the detected hotspots could facilitate the viral genomic study and may contribute for screening antiviral target sites.
Related articles
Related articles are currently not available for this article.