Intrahost dynamics, together with genetic and phenotypic effects predict the success of viral mutations
Abstract
Predicting the fitness of mutations in the evolution of pathogens is a long-standing and important, yet largely unsolved problem. In this study, we used SARS-CoV-2 as a model system to explore whether the intrahost diversity of viral infections could provide clues on the relative fitness of single amino acid variants (SAVs). To do so, we analysed ~15 million complete genomes and nearly ~8000 sequencing libraries generated from SARS-CoV-2 infections, which were collected at various timepoints during the COVID-19 pandemic. Across timepoints, we found that many of the SAVs that went on to reach high frequency could be detected in the intrahost diversity of samples collected at a median of 3-22 months prior. Additionally, we found that genetic linkage patterns observed at the interhost level can also be observed in the intrahost diversity of infections. Application of machine learning models allowed us to learn highly generalisable intrahost, physiochemical and phenotypic patterns to forecast the future fitness of intrahost SAVs (r2=0.48-0.63). Most of these models performed significantly better when considering genetic linkage between mutations (r2=0.53-0.67), pointing to epistasis being an important determinant in the evolution of SARS-CoV-2. Overall, our results highlight the predictive power of intrahost diversity data, and document the evolutionary forces shaping the fitness of mutations. Such insights offer potential to forecast the emergence of future variants and ultimately inform the design of vaccine targets.
Related articles
Related articles are currently not available for this article.