Identity-by-descent captures shared environmental factors at biobank scale
Abstract
The apple does not fall far from the tree is an old idiom that encapsulates a key concept: being related extends beyond merely sharing genetic material to include shared environments and culture. Using genomic and electronic health record data from 13,143 individuals in the Biorepository for Integrative Genomics, we applied a hierarchical community detection algorithm to classify individuals based on the proportion of their genome shared identical by descent (IBD). This approach captured fine-scale demographic structure beyond conventional ancestry classifications. By integrating neighborhood-level geographic data with census-derived environmental metrics, we revealed unequal exposure to environmental stressors across IBD-defined communities, which correlated with differential rates of health conditions. We found that two-thirds of the excess disease risk captured through IBD-based clustering remains unexplained by measured environmental factors. Notably, these community-level health disparities persisted after adjusting for self-reported race, demonstrating that IBD captures health-relevant variations beyond conventional demographic categories. We implemented an open-source dashboard that correlates IBD-defined subcommunities with disease prevalence and environmental exposures, enabling real-time clinical decision-making and public health surveillance. Overall, we demonstrate that IBD-based clustering jointly captures genetic and environmental determinants of health, offering a scalable framework for precision health and population genetics, and translating biobank data into actionable insights for participants while maintaining privacy.
Related articles
Related articles are currently not available for this article.