Mantis: flexible and consensus-driven genome annotation
Abstract
Background
The past decades have seen a rapid development of the (meta-)omics fields, producing an unprecedented amount of data. Through the use of well-characterized datasets we can infer the role of previously functionally unannotated proteins from single organisms and consortia. In this context, protein function annotation allows the identification of regions of interest (i.e. domains) in protein sequences and the assignment of biological functions. Despite the existence of numerous tools, some challenges remain, specifically in terms of speed, flexibility, and reproducibility. In the era of big data it also becomes increasingly important to cease limiting our findings to a single reference, coalescing knowledge from different data sources, thus overcoming some limitations in overly relying on computationally generated data.
Results
We implemented a protein annotation tool - Mantis, which uses text mining to integrate knowledge from multiple reference data sources into a single consensus-driven output. Mantis is flexible, allowing for total customization of the reference data used, adaptable, and reproducible across different research goals and user environments. We implemented a depth-first search algorithm for domain-specific annotation, which led to an average 0.038 increase in precision when compared to sequence-wide annotation. Mantis is fast, annotating an average genome in 25-40 minutes, whilst also outputting high-quality annotations (average coverage 81.4%, average precision 0.892).
Conclusions
Mantis is a protein function annotation tool that produces high-quality consensusdriven protein annotations. It is easy to set up, customize, and use, scaling from single genomes to large metagenomes. Mantis is available under the MIT license available at<ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://github.com/PedroMTQ/mantis">https://github.com/PedroMTQ/mantis</ext-link>.
Related articles
Related articles are currently not available for this article.