Partial RdRp sequences offer a robust method for Coronavirus subgenus classification
Abstract
The recent reclassification of theRiboviria, and the introduction of multiple new taxonomic categories including both subfamilies and subgenera for coronaviruses (familyCoronaviridae, subfamilyOrthocoronavirinae) represents a major shift in how official classifications are used to designate specific viral lineages. While the newly defined subgenera provide much-needed standardisation for commonly cited viruses of public health importance, no method has been proposed for the assignment of subgenus based on partial sequence data, or for sequences that are divergent from the designated holotype reference genomes. Here, we describe the genetic variation of a partial region of the coronavirus RNA-dependent RNA polymerase (RdRp), which is one of the most used partial sequence loci for both detection and classification of coronaviruses in molecular epidemiology. We infer Bayesian phylogenies from more than 7000 publicly available coronavirus sequences and examine clade groupings relative to all subgenus holotype sequences. Our phylogenetic analyses are largely coherent with genome-scale analyses based on designated holotype members for each subgenus. Distance measures between sequences form discrete clusters between taxa, offering logical threshold boundaries that can attribute subgenus or indicate sequences that are likely to belong to unclassified subgenera both accurately and robustly. We thus propose that partial RdRp sequence data of coronaviruses is sufficient for the attribution of subgenus-level taxonomic classifications and we supply the R package, “MyCoV”, which provides a method for attributing subgenus and assessing the reliability of the attribution.
Importance Statement
The analysis of polymerase chain reaction amplicons derived from biological samples is the most common modern method for detection and classification of infecting viral agents, such as Coronaviruses. Recent updates to the official standard for taxonomic classification of Coronaviruses, however, may leave researchers unsure as to whether the viral sequences they obtain by these methods can be classified into specific viral taxa due to variations in the sequences when compared to type strains. Here, we present a plausible method for defining genetic dissimilarity cut-offs that will allow researchers to state which taxon their virus belongs to and with what level of certainty. To assist in this, we also provide the R package ‘MyCoV’ which classifies user generated sequences.
Related articles
Related articles are currently not available for this article.