CroCoDeEL: accurate control-free detection of cross-sample contamination in metagenomic data
Abstract
Metagenomic sequencing provides profound insights into microbial communities, but it is often compromised by technical biases, including cross-sample contamination. This underexplored phenomenon arises when microbial content is inadvertently exchanged among concurrently processed samples. Such contamination that distort microbial profiles, poses significant risks to the reliability of metagenomic data and downstream analyses. Despite its critical impact, this issue remains insufficiently addressed. To fill this gap, we introduce CroCoDeEL, a decision-support tool for detecting and quantifying cross-sample contamination. Leveraging a pre-trained supervised model, CroCoDeEL identifies contamination patterns in species abundance profiles with high accuracy. Unlike existing tools, it requires no negative controls or prior knowledge of sample processing positions, offering improved accuracy and versatility. Benchmarks across three public datasets demonstrate that CroCoDeEL accurately detects contaminated samples and identifies their contamination sources, even at low rates (<0.1%), provided sufficient sequencing depth. Our findings suggest that cross-sample contamination is prevalent in metagenomics and emphasize the necessity of systematically integrating contamination detection into sequencing data quality control.
Related articles
Related articles are currently not available for this article.