Qimai: a multi-agent framework for zero-shot DNA-protein interaction prediction
Abstract
Accurate prediction of DNA-protein interactions, a fundamental task in genomics, is limited by the poor generalization of existing models to novel proteins not seen during training. To address this challenge, we introduce Qimai, a modular AI agent framework that integrates deep learning predictions with biological evidence using Large Language Model (LLM) as reasoning engine. Qimai combines direct motif evidence from the query protein, indirect motif evidence from its interactors, and quantitative prediction from a new transformer-based DPI model to produce explainable predictions with confidence scores. On a benchmark of 78 unseen proteins, Qimai consistently outperforms standalone deep learning models across all metrics, increasing the Area Under Curve of the Precision-Recall (AUC-PR), the Area Under Curve of the Receiver Operating Characteristic (AUC-ROC), and Matthews Correlation Coefficient (MCC) by 17.6%, 15.6%, and 244% respectively compared to the best standalone model. Ablation analyses reveal that this gain is driven by the LLM’s ability to dynamically weigh diverse evidence, with indirect motif evidence of co-factors particularly critical for unseen proteins. Qimai establishes a generalizable and interpretable paradigm for integrating heterogeneous data in predictive genomics. This framework is accessible via the Qimai web portal (<ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://qimai.wanglab.ucsd.edu/">https://qimai.wanglab.ucsd.edu/</ext-link>).
Related articles
Related articles are currently not available for this article.