A critical reexamination of recovered SARS-CoV-2 sequencing data
Abstract
SARS-CoV-2 genomes collected at the onset of the Covid-19 pandemic are valuable because they could help understand how the virus entered the human population. In 2021, Jesse Bloom reported on the recovery of a dataset of raw sequencing reads that had been removed from the NCBI SRA database at the request of the data generators, a scientific team at Wuhan University (Wanget al., 2020b). Bloom concluded that the data deletion had obfuscated the origin of SARS-CoV-2 and suggested that deletion may have been requested to comply with a government order; further, he questioned reported sample collection dates on and after January 30, 2020. Here, we show that sample collection dates were published in 2020 by Wanget al. together with the sequencing reads, and match the dates given by the authors in 2021. Collection dates of January 30, 2020 were manually removed by Bloom during his analysis of the data. We examine mutations in these sequences and confirm that they are entirely consistent with the previously known genetic diversity of SARS-CoV-2 of late January 2020. Finally, we explain how an apparent phylogenetic rooting paradox described by Bloom was resolved by subsequent analysis. Our reanalysis demonstrates that there was no basis to question the sample collection dates published by Wanget al..
Note for bioRxiv readers
The automatically generated Full Text version of our manuscript is missing footnotes; they are available in the PDF version.
Related articles
Related articles are currently not available for this article.