Haplotype-resolved diploid genome inference on pangenome graphs
Abstract
Genotyping is the task of identifying the genetic variants present in a sample from sequencing data, and it is a fundamental problem in computational biology. Existing genotyping approaches typically rely on either haplotype reference panels or pangenome graphs. Compared to haplotype reference panels, pangenome graphs compactly represent both small and large variants, making them a powerful and expressive reference model. Motivated by the haplotype reconstruction framework of Li and Stephens, Chandra et al. [Genome Research, 2025] introduced a deterministic formulation for genotyping, reduced it to a haploid genome inference problem on pangenome graphs, proved its NP-hardness, and proposed integer linear and quadratic programming approaches with strong empirical performance.
In this work, we introduce new problem formulations and scalable algorithms for inferring phased diploid genomes. We implement these methods in our tool <monospace>DipGenie</monospace> and evaluate phasing and structural variant calling accuracy on real Illumina short-read data. <monospace>DipGenie</monospace> achieves switch error rate as low as 0.7% and F1-score up to 0.6 on structural variant calling, compared to switch error rate of up to 7.0% and structural variant calling F1-scores of 0.5 for <monospace>VG</monospace> . These results show that <monospace>DipGenie</monospace> substantially reduces phasing errors while improving the accuracy of structural variant detection.
Implementation
<ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://github.com/gsc74/DipGenie">https://github.com/gsc74/DipGenie</ext-link>
Related articles
Related articles are currently not available for this article.