Metagenome-assembled genomes from a population-based cohort uncover novel gut species and within-species diversity, revealing prevalent disease associations
Abstract
Metagenomic profiling has advanced understanding of microbe-host interactions. However, widely used read-based approaches are limited by incomplete reference databases and the inability to resolve strain-level variation. Here, we present a scalable, genome-resolved framework that integrates population-specific metagenome assembled genomes (MAGs) to discover novel species, sub-species diversity, and disease associations. From 1,878 deeply sequenced samples in the Estonian microbiome cohort (EstMB-deep), we reconstructed 84,762 MAGs representing 2,257 species, including 353 (15.6%) previously uncharacterized species reaching up to 30% relative abundances in some individuals. We integrated these MAGs with the Unified Human Gastrointestinal Genome (UHGG) collection to create an expanded reference (GUTrep), enabling profiling of 2,509 EstMB individuals and testing associations with 33 prevalent diseases. Of 25 diseases with significant associations, 8 involved newly identified species, underscoring the value of population-specific MAGs. To quantify within-species diversity, we developed the Genome Unit Number (GUN), a novel MAG-based metric that informed sub-species analyses. Based on normalized GUN (nGUN), we prioritized Odoribacter splanchnicus, a prevalent species with the lowest sub-species heterogeneity, yielding sufficient power for sub-species association study. We identified two dominant genome units, GU-N1 and GU-N2, with distinct gene repertoires and divergent disease associations. Notably, GU-N1 was negatively associated with gastritis and duodenitis and hypertensive heart disease, associations undetected at the species level. Our study expands the human gut reference landscape, demonstrates the importance of population-specific MAGs for uncovering novel microbial diversity, and reveals new disease associations on sub-species level obscured at higher taxonomic levels, highlighting the need for genome-resolved approaches in microbiome research.
IMPORTANCE
Microbiome studies increasingly recognize that species-level profiles can mask critical sub-species differences relevant to health and disease. However, our work shows that within-species diversity varies drastically across gut microbes, with some species exhibiting almost as many distinct sub-species clusters as recovered genomes, making association studies at the sub-species level essentially intractable. To address this, we introduce the Genome Unit Number (GUN), a scalable metric for quantifying sub-species structure. Using GUN, we demonstrate that only species with limited within-species diversity, such as Odoribacter splanchnicus , currently allow for robust sub-species association testing. These findings emphasize the need to systematically evaluate species structure across the gut microbiome and call for the development of new computational and statistical approaches to enable meaningful sub-species analyses in highly diverse species.
Related articles
Related articles are currently not available for this article.