CoCoPyE: feature engineering for learning and prediction of genome quality indices
Abstract
The exploration of the microbial world has been greatly advanced by the reconstruction of genomes from metagenomic sequence data. However, the rapidly increasing number of metagenome-assembled genomes has also resulted in a wide variation in data quality. It is therefore essential to quantify the achieved completeness and possible contamination of a reconstructed genome before it is used in subsequent analyses. The classical approach for the estimation of quality indices solely relies on a relatively small number of universal single copy genes. Recent tools try to extend the genomic coverage of estimates for an increased accuracy. CoCoPyE is a fast tool based on a novel two-stage feature extraction and transformation scheme. First it identifies genomic markers and then refines the marker-based estimates with a machine learning approach. In our simulation studies, CoCoPyE showed a more accurate prediction of quality indices than the existing tools.
Related articles
Related articles are currently not available for this article.