Benchmarking of Human Read Removal Strategies for Viral and Microbial Metagenomics
Abstract
Human reads are a key contaminant in microbial metagenomics and enrichment-based studies, requiring removal for computational efficiency, biological analysis, and privacy protection. Variousin silicomethods exist, but their effectiveness depends on the parameters and reference genomes used. Here, we assess different methods, including the impact of the updated T2T-CHM13 human genome versus GRCh38. Using a synthetic dataset of viral and human reads, we evaluated performance metrics for multiple approaches. We found that the usage of high-sensitivity configuration of Bowtie2 with the T2T-CHM13 reference assembly significantly improves human read removal with minimal loss of specificity, albeit at higher computational cost compared to other methods investigated. Applying this approach to a publicly available microbiome dataset, we effectively removed sex-determining SNPs with little impact on microbial assembly. Our results suggest that our high-sensitivity Bowtie2 approach with the T2T-CHM13 is the best method tested to minimise identifiability risks from residual human reads.
Related articles
Related articles are currently not available for this article.