Deconvolute individual genomes from metagenome sequences through read clustering

Kexue Li
Lili Wang
Lizhen Shi
Li Deng
Zhong Wang

5 evaluations Published on Apr 29, 2019

This article on Sciety

Abstract

Motivation

Metagenome assembly from short next-generation sequencing data is a challenging process due to its large scale and computational complexity. Clustering short reads before assembly offers a unique opportunity for parallel downstream assembly of genomes with individualized optimization. However, current read clustering methods suffer either false negative (under-clustering) or false positive (over-clustering) problems.

Results

Based on a previously developed scalable read clustering method on Apache Spark, SpaRC, that has very low false positives, here we extended its capability by adding a new method to further cluster small clusters. This method exploits statistics derived from multiple samples in a dataset to reduce the under-clustering problem. Using a synthetic dataset from mouse gut microbiomes we show that this method has the potential to cluster almost all of the reads from genomes with sufficient sequencing coverage. We also explored several clustering parameters that deferentially affect genomes with various sequencing coverage.

Availability

<monospace> <ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://bitbucket.org/berkeleylab/jgi-sparc/">https://bitbucket.org/berkeleylab/jgi-sparc/</ext-link> </monospace> .

Contact

<email>zhongwang@lbl.gov</email>

Related articles are currently not available for this article.