CroCoDeEL: accurate control-free detection of cross-sample contamination in metagenomic data

This article has 0 evaluations Published on
Read the full article Related papers
This article on Sciety

Abstract

Metagenomic sequencing provides profound insights into microbial communities, but it is often compromised by technical biases, including cross-sample contamination. This phenomenon arises when microbial content is inadvertently exchanged among concurrently processed samples, distorting microbial profiles and compromising the reliability of metagenomic data and downstream analyses. Existing detection methods often rely on negative controls, which are inconvenient and do not detect contamination within real samples. Meanwhile, strain-level bioinformatics approaches fail to distinguish contamination from natural strain sharing and lack sensitivity. To fill this gap, we introduce CroCoDeEL, a decision-support tool for detecting and quantifying cross-sample contamination. Leveraging linear modeling and a pre-trained supervised model, CroCoDeEL identifies specific contamination patterns in species abundance profiles. It requires no negative controls or prior knowledge of sample processing positions, offering improved accuracy and versatility. Benchmarks across three public datasets demonstrate that CroCoDeEL accurately detects contaminated samples and identifies their contamination sources, even at low rates (<0.1%), provided sufficient sequencing depth. Notably, we discovered critical contamination cases in highly cited studies, calling some of their results into question. Our findings suggest that cross-sample contamination is a widespread yet underexplored issue in metagenomics and emphasize the necessity of systematically integrating contamination detection into sequencing quality control.

Related articles

Related articles are currently not available for this article.