Large scale genomic and evolutionary study reveals SARS-CoV-2 virus isolates from Bangladesh strongly correlate with European origin and not with China
Abstract
Rationale
The global public health is in serious crisis due to emergence of SARS-CoV-2 virus. Studies are ongoing to reveal the genomic variants of the virus circulating in various parts of the world. However, data generated from low- and middle-income countries are scarce due to resource limitation. This study was focused to perform whole genome sequencing of 151 SARS-CoV-2 isolates from COVID-19 positive Bangladeshi patients. The goal of this study was to identify the genomic variants among the SARS-CoV-2 virus isolates in Bangladesh, to determine the molecular epidemiology and to develop a relationship between host clinical trait with the virus genomic variants.
Method
Suspected patients were tested for COVID-19 using one step commercial qPCR kit for SARS-CoV-2 Virus. Viral RNA was extracted from positive patients, converted to cDNA which was amplified using Ion AmpliSeq™ SARS-CoV-2 Research Panel. Massive parallel sequencing was carried out using Ion AmpliSeq™ Library Kit Plus. Assembly of raw data is done by aligning the reads to a pre-defined reference genome (NC_045512.2) while retaining the unique variations of the input raw data by creating a consensus genome. A random forest-based association analysis was carried out to correlate the viral genomic variants with the clinical traits present in the host.
Result
Among the 151 viral isolates, we observed the 413 unique variants. Among these 8 variants occurred in more than 80 % of cases which include 241C to T, 1163A to T, 3037C to T,14408C to T, 23403A to G, 28881G to A, 28882 G to A, and finally the 28883G to C. Phylogenetic analysis revealed a predominance of variants belonging to GR clade, which have a strong geographical presence in Europe, indicating possible introduction of the SARS-CoV-2 virus into Bangladesh through a European channel. However, other possibilities like a route of entry from China cannot be ruled out as viral isolate belonging to L clade with a close relationship to Wuhan reference genome was also detected. We observed a total of 37 genomic variants to be strongly associated with clinical symptoms such as fever, sore throat, overall symptomatic status, etc. (Fisher’s Exact Test p-value<0.05). The most mention-worthy among those were the 3916CtoT (associated with causing sore throat, p-value 0.0005), the 14408C to T (associated with protection from developing cough, p-value= 0.027), and the 28881G to A, 28882G to A, and 28883G to C variant (associated with causing chest pain, p-value 0.025).
Conclusion
To our knowledge, this study is the first large scale phylogenomic studies of SARS-CoV-2 virus circulating in Bangladesh. The observed epidemiological and genomic features may inform future research platform for disease management, vaccine development and epidemiological study.
Related articles
Related articles are currently not available for this article.