Snapshot of the evolution and mutation patterns of SARS-CoV-2
Abstract
The COVID-19 pandemic is the most important public health threat in recent history. Here we study how its causal agent, SARS-CoV-2, has diversified genetically since its first emergence in December 2019. We have created a pipeline combining both phylogenetic and structural analysis to identify possible human-adaptation related mutations in a data set consisting of 4,894 SARS-CoV-2 complete genome sequences. Although the phylogenetic diversity of SARS-CoV-2 is low, the whole genome phylogenetic tree can be divided into five clusters/clades based on the tree topology and clustering of specific mutations, but its branches exhibit low genetic distance and bootstrap support values. We also identified 11 residues that are high-frequency substitutions, with four of them currently showing some signal for potential positive selection. These fast-evolving sites are in the non-structural proteins nsp2, nsp5 (3CL-protease), nsp6, nsp12 (polymerase) and nsp13 (helicase), in accessory proteins (ORF3a, ORF8) and in the structural proteins N and S. Temporal and spatial analysis of these potentially adaptive mutations revealed that the incidence of some of these sites was declining after having reached an (often local) peak, whereas the frequency of other sites is continually increasing and now exhibit a worldwide distribution. Structural analysis revealed that the mutations are located on the surface of the proteins that modulate biochemical properties. We speculate that this improves binding to cellular proteins and hence represents fine-tuning of adaptation to human cells. Our study has implications for the design of biochemical and clinical experiments to assess whether important properties of SARS-CoV-2 have changed during the epidemic.
Related articles
Related articles are currently not available for this article.