A recurrence based approach for validating structural variation using long-read sequencing technology
Abstract
Although there are numerous algorithms that have been developed to identify structural variation (SVs) in genomic sequences, there is a dearth of approaches that can be used to evaluate their results. The emergence of new sequencing technologies that generate longer sequence reads can, in theory, provide direct evidence for all types of SVs regardless of the length of region through which it spans. However, current efforts to use these data in this manner require the use of large computational resources to assemble these sequences as well as manual inspection of each region. Here, we present VaPoR, a highly efficient algorithm that autonomously validates large SV sets using long read sequencing data. We assess of the performance of VaPoR on both simulated and real SVs and report a high-fidelity rate for various features including overall accuracy, sensitivity of breakpoint precision, and predicted genotype.
Related articles
Related articles are currently not available for this article.