Long-read sequencing of SARS-CoV-2 reveals novel transcripts and a diverse complex transcriptome landscape
Abstract
Severe Acute Respiratory Syndrome Coronavirus 2, SARS-CoV-2 (COVID-19), is a positive single-stranded RNA virus with a 30 kb genome that is responsible for the current pandemic. To date, the genomes of global COVID-19 variants have been primarily characterized via short-read sequencing methods. Here, we devised a long-read RNA (IsoSeq) sequencing approach to characterize the COVID-19 transcript landscape and expression of its ∼27 coding regions. Our analysis identified novel COVID-19 transcripts including a) a short ∼65-70 nt 5’-UTR fused to various downstream ORFs encoding accessory proteins such as the envelope, ORF 8, and ORF 9 (nucleocapsid) proteins, that are relatively highly expressed, b) novel SNVs that are differentially expressed, whereby a subset are suggestive of partial RNA editing events, and c) SNVs at functional sites, whereby at least one is associated with a differentially expressed spike protein isoform. These previously uncharacterized COVID-19 isoforms, expressed genes, and gene variants were corroborated using ddPCR. Understanding this transcriptional complexity may help provide insight into the biology and pathogenicity of SARS-CoV-2 compared to other coronaviruses.
Related articles
Related articles are currently not available for this article.