Early detection and improved genomic surveillance of SARS-CoV-2 variants from deep sequencing data
Abstract
In the definition of fruitful strategies to contrast the worldwide diffusion of SARS-CoV-2, maximum efforts must be devoted to the early detection of dangerous variants. An effective help to this end is granted by the analysis of deep sequencing data of viral samples, which are typically discarded after the creation of consensus sequences. Indeed, only with deep sequencing data it is possible to identify intra-host low-frequency mutations, which are a direct footprint of mutational processes that may eventually lead to the origination of functionally advantageous variants. Accordingly, a timely and statistically robust identification of such mutations might inform political decision-making with significant anticipation with respect to standard analyses based on con-sensus sequences.
To support our claim, we here present the largest study to date of SARS-CoV-2 deep sequencing data, which involves 220,788 high quality samples, collected over 20 months from 137 distinct studies. Importantly, we show that a rele-vant number of spike and nucleocapsid mutations of interest associated to the most circulating variants, including Beta, Delta and Omicron, might have been intercepted several months in advance, possibly leading to different public-health decisions. In addition, we show that a refined genomic surveillance system involving high- and low-frequency mutations might allow one to pin-point possibly dangerous emerging mutation patterns, providing a data-driven automated support to epidemiologists and virologists.
Related articles
Related articles are currently not available for this article.