Leveraging Natural Language Processing models to decode the dark proteome across the Animal Tree of Life
Abstract
Functional annotation is crucial in biology, but many protein-coding genes remain uncharacterized, especially in non-model organisms. FANTASIA (Functional ANnoTAtion based on embedding space SImilArity) integrates protein language models for large-scale functional annotation. Applied to ∼1,000 animal proteomes, it predicts functions to virtually all proteins, revealing previously uncharacterized functions that enhance our understanding of molecular evolution. FANTASIA is available on GitHub at<ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://github.com/CBBIO/FANTASIA">https://github.com/CBBIO/FANTASIA</ext-link>.
Related articles
Related articles are currently not available for this article.