A genotype-phenotype transformer to assess and explain polygenic risk
Abstract
Genome-wide association studies have linked millions of genetic variants to biomedical phenotypes, but their utility has been limited by lack of mechanistic understanding and widespread epistatic interactions. Recently, Transformer models have emerged as powerful machine learning architectures with potential to address these and other challenges. Here we introduce the Genotype-to-Phenotype Transformer (G2PT), a framework for modeling hierarchical information flow among variants, genes, multigenic systems, and phenotypes. As proof-of-concept, we train G2PT to model the genetics of metabolic traits including insulin resistance (serum triglycerides-to-HDL ratio), LDL and type-2 diabetes. G2PT predicts these traits with accuracy exceeding state-of-the-art and, unlike other polygenic models, extends to distinct populations not used for training. Predictions of insulin resistance are based on >1,395 variants within 20 systems and include epistatic interactions among variants, e.g. between APOA4 and CETP in phospholipid transfer. This work positions hierarchical graph transformers as a next-generation approach to polygenic risk.
Related articles
Related articles are currently not available for this article.