GnnDebugger: GNN based error correction in De Bruijn Graphs
Abstract
Motivation
Modern sequencing technologies have enabled the reconstruction of complete mammalian genomes from telomere to telomere. However, scaling this achievement to thousands of species and population-level studies remains a challenge. Key bottlenecks include the low quality of the draft assemblies and the high coverage requirements. In particular, reconstructing complete and accurate sequences of both haplotypes in diploid genomes is especially difficult since the sequencing depth is not always sufficient to properly reconstruct diverged regions. Inspired by the success of neural networks in extracting patterns from the data on a massive scale, we introduce a method for correcting errors in De Bruijn Graphs using Graph Neural Networks.
Results
Our model provides a reliable classification of edges into correct and erroneous, especially for diploid genomes with coverage depth35and lower. We demonstrate that these predictions can guide the downstream read error correction algorithm and genome assembly, ultimately allowing for more accurate genome assembly.
Availability and implementation
BothGnnDebugger(<ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://github.com/m5imunovic/gnndebugger">https://github.com/m5imunovic/gnndebugger</ext-link>) and LJA (<ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://github.com/AntonBankevich/LJA/tree/gnndebugger">https://github.com/AntonBankevich/LJA/tree/gnndebugger</ext-link>) are available on GitHub. Datasets used for training and testing of ML model are available at Zenodo:<ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.15073168">https://doi.org/10.5281/zenodo.15073168</ext-link>. HG002 reference and reads are available at<ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://github.com/marbl/HG002">https://github.com/marbl/HG002</ext-link>. Primates references and reads are available at<ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://github.com/marbl/Primates">https://github.com/marbl/Primates</ext-link>.
Related articles
Related articles are currently not available for this article.