Efficient evidence-based genome annotation with EviAnn

This article has 0 evaluations Published on
Read the full article Related papers
This article on Sciety

Abstract

For many years, machine learning-basedab initiogene finding approaches have been the central components of eukaryotic genome annotation pipelines, and they remain so today. The reliance on these approaches was originally sustained by the high cost and low availability of gene expression data, a primary source of evidence for gene annotation along with protein homology. However, innovations in modern sequencing technologies have revolutionized the acquisition of abundant gene expression data, allowing us to rely more heavily on this class of evidence. In addition to gene expression data, proteins found in a multitude of well-annotated genomes represent another invaluable resource for gene annotation. Existing annotation packages often underutilize these data sources, which prompted us to develop EviAnn (<underline>Evi</underline>dence-based<underline>Ann</underline>otation), a novel evidence-based eukaryotic gene annotation system. EviAnn takes a strongly data-driven approach, building the exon-intron structure of genes from transcript alignments or protein-sequence homology rather than from purelyab initiogene finding techniques. We show that when provided with the same input data, EviAnn consistently outperforms current state-of-the-art packages including BRAKER3, MAKER2, and FINDER, while utilizing considerably less computer time. Annotation of a mammalian genome can be completed in less than an hour on a single multi-core server. EviAnn is freely available under an open-source license from<ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://github.com/alekseyzimin/EviAnn_release">https://github.com/alekseyzimin/EviAnn_release</ext-link>.

Related articles

Related articles are currently not available for this article.