Analysis of 46,046 SARS-CoV-2 whole-genomes leveraging principal component analysis (PCA)

This article has 1 evaluations Published on
Read the full article Related papers
This article on Sciety

Abstract

Since the beginning of the global SARS-CoV-2 pandemic, there have been a number of efforts to understand the mutations and clusters of genetic lines of the SARS-CoV-2 virus. Until now, phylogenetic analysis methods have been used for this purpose. Here we show that Principal Component Analysis (PCA), which is widely used in population genetics, can not only help us to understand existing findings about the mutation processes of the virus, but can also provide even deeper insights into these processes while being less sensitive to sequencing gaps. Here we describe a comprehensive analysis of a 46,046 SARS-CoV-2 genome sequence dataset downloaded from the GISAID database in June of this year.

Summary

PCA provides deep insights into the analysis of large data sets of SARS-CoV-2 genomes, revealing virus lineages that have thus far been unnoticed.

Related articles

Related articles are currently not available for this article.