BioReason-Pro: Advancing Protein Function Prediction with Multimodal Biological Reasoning

Adibvafa Fallahpour
Arman Seyed-Ahmadi
Parsa Idehpour
Omar Ibrahim
Purav Gupta
Jack Naimer
Kevin Zhu
Arnav Shah
Shihao Ma
Abhinav Adduri
Talu Güloglu
Nuo Liu
Haotian Cui
Arihant Jain
Max de Castro
Amirfaham Fallahpour
Antonio Cembellin-Prieto
John S. Stiles
Filip Nemčko
Alexander A. Nevue
Hyungseok C. Moon
Lucas Sosnick
Olivia Markham
Haonan Duan
Michelle Y. Y. Lee
Andrea F. M. Salvador
Chris J. Maddison
Christoph A. Thaiss
Chiara Ricci-Tam
Brian S. Plosky
Dave P. Burke
Patrick D. Hsu
Hani Goodarzi
Bo Wang

2 evaluations Published on Mar 20, 2026

This article on Sciety

Abstract

Protein function annotation is fundamental to understanding biological mechanisms, designing therapeutics, and advancing biomedical research. Current computational methods either rely on shallow sequence similarity or treat function prediction as isolated classification tasks, failing to capture the integrative reasoning across sequence, structure, domains, and interactions that expert biologists perform to infer function. We introduce BioReason-Pro, the first multimodal reasoning large language model (LLM) for protein function prediction that integrates protein embeddings with biological context to generate structured reasoning traces. A key input into BioReason-Pro is the set of GO term predictions made by GO-GPT, our autoregressive transformer that captures hierarchical and cross-aspect dependencies of GO terms. BioReason-Pro is trained via supervised fine-tuning on synthetic reasoning traces generated by GPT-5 for over 130K proteins and further optimized through reinforcement learning. It achieves 73.6% F _max on GO term prediction and an LLM judge score of 8/10 on functional summaries, substantially outperforming previous methods. Evaluations with human protein experts show that BioReason-Pro annotations are preferred over ground truth UniProt annotations in 79% of cases. Remarkably, BioReason-Pro de novo predicted experimentally confirmed binding partners with per-residue attention localizing to the exact contact residues resolved in cryo-EM structures of those complexes. Together, GO-GPT and BioReason-Pro establish a framework for protein function prediction that combines precise ontology modeling with interpretable biological reasoning.

Related articles are currently not available for this article.