Integrative metabolome-genome analysis reveals the genetic architecture of metabolic diversity in sorghum grain
Abstract
Background
Cereal grains are a cornerstone of global food and feed security, as well as a key raw material for bioenergy production. Dissecting the molecular networks and metabolic pathways that shape grain composition is critical for advancing agricultural and nutritional sciences. Sorghum (Sorghum bicolor), a climate-resilient C4 cereal with exceptional tolerance to heat and drought, is an important model for metabolic genomics. However, the genetic basis of natural variation in sorghum grain metabolites, key determinants of nutritional value and end-use quality, remains largely unexplored.
Results
Our integrative metabolomic, genomic analyses and machine learning reveal extensive metabolic diversity in sorghum grain and its underlying genetic architecture. We performed large-scale untargeted metabolomic profiling of mature grains from the Sorghum Association Panel, detecting 4,877 metabolites with extensive variability across accessions. More than 36% of metabolites, particularly those linked to amino acid and polyphenol biosynthesis, exhibited high natural diversity, underscoring sorghum’s potential for nutritional enhancement. Metabolite-based genome-wide association studies (mGWAS) identified approximately 4.15 million significant SNP–metabolite associations enriched in regulatory regions, thereby revealing the genetic architecture of metabolic diversity. We further identified 38 metabolite–gene clusters, highlighting coordinated regulation of key pathways. By integrating machine learning, we uncovered major metabolites determining grain color and pinpointed associated genes, demonstrating predictive frameworks for phenotype-genotype-metabolite relationships.
Conclusions
This study provides the first population-scale map of sorghum grain metabolomic and genetic diversity. To enable broad access and translational use, we established the Sorghum Grain Metabolite Diversity Atlas (SorGMDA), an open database integrating metabolomic and genomic variation. Together, these resources offer a foundation for comparative metabolic genomics across cereals, enable systems-level dissection of metabolic regulation, and support breeding of nutrient-dense, climate-resilient grains to address global food and agriculture challenges.
Related articles
Related articles are currently not available for this article.