Metaphor - A workflow for streamlined assembly and binning of metagenomes
Abstract
Recent advances in bioinformatics and high-throughput sequencing have enabled the large-scale recovery of genomes from metagenomes. This has the potential to bring important insights as researchers can bypass cultivation and analyse genomes sourced directly from environmental samples. There are, however, technical challenges associated with this process, most notably the complexity of computational workflows required to process metagenomic data, which include dozens of bioinformatics software tools, each with their own set of customisable parameters that affect the final output of the workflow. At the core of these workflows are the processes of assembly - combining the short input reads into longer, contiguous fragments (contigs), and binning - clustering these contigs into individual genome bins. Both processes can be done for each sample separately or by pooling together multiple samples to leverage information from a combination of samples. Here we present Metaphor, a fully-automated workflow for genome-resolved metagenomics (GRM). Metaphor differs from existing GRM workflows by offering flexible approaches for the assembly and binning of the input data, and by combining multiple binning algorithms with a bin refinement step to achieve high quality genome bins. Moreover, Metaphor generates reports to evaluate the performance of the workflow. We showcase the functionality of Metaphor on different synthetic datasets, and the impact of available assembly and binning strategies on the final results. The workflow is freely available at<ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://github.com/vinisalazar/metaphor">https://github.com/vinisalazar/metaphor</ext-link>.
Author summary
We present Metaphor, a user-friendly, automated workflow for the recovery of genomes from metagenomes. Our tool offers flexible options for assembling and binning metagenomic contigs, that may be adjusted according to the characteristics of the input data and available computational resources, and a combination of binning algorithms, which improves the quantity and quality of resulting genome bins. We showcase the performance of Metaphor on synthetic benchmarking datasets and discuss the implication of methodological decisions regarding the strategy for assembling and binning metagenomic contigs.
Related articles
Related articles are currently not available for this article.