Struo2: efficient metagenome profiling database construction for ever-expanding microbial genome datasets
Abstract
Mapping metagenome reads to reference databases is the standard approach for assessing microbial taxonomic and functional diversity from metagenomic data. However, public reference databases often lack recently generated genomic data such as metagenome-assembled genomes (MAGs), which can limit the sensitivity of read-mapping approaches. We previously developed the Struo pipeline in order to provide a straight-forward method for constructing custom databases; however, the pipeline does not scale well with the ever-increasing number of publicly available microbial genomes. Moreover, the pipeline does not allow for efficient database updating as new data are generated. To address these issues, we developed Struo2, which is >3.5-fold faster than Struo at database generation and can also efficiently update existing databases. We also provide custom Kraken2, Bracken, and HUMAnN3 databases that can be easily updated with new genomes and/or individual gene sequences. Struo2 enables feasible database generation for continually increasing large-scale genomic datasets.
Availability
Struo2:<ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://github.com/leylabmpi/Struo2">https://github.com/leylabmpi/Struo2</ext-link>
Pre-built databases:<ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="http://ftp.tue.mpg.de/ebio/projects/struo2/">http://ftp.tue.mpg.de/ebio/projects/struo2/</ext-link>
Utility tools:<ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://github.com/nick-youngblut/gtdb_to_taxdump">https://github.com/nick-youngblut/gtdb_to_taxdump</ext-link>
Related articles
Related articles are currently not available for this article.