Benchmarking of signaling networks generated by large language models

Jeevan Tewari
Benjamin W. Dahl
Jeffrey J. Saucerman

3 evaluations Published on Jul 29, 2025

This article on Sciety

Abstract

Computational models of signaling networks provide frameworks for predicting how molecular cues guide cell decisions. But they are typically limited by manual curation from incomplete literature. Here, we test whether general-purpose large language models (LLMs) generate accurate models of signaling networks. We find that general purpose LLMs generate 24-58% of the reactions of literature-curated networks for cardiomyocyte hypertrophy, myofibroblast activation, and mechano-signaling, and predicting network responses to perturbations with accuracies of 5-26%. While current general-purpose LLMs generate signaling networks with limited accuracy, this study provides a pipeline and benchmarks to guide future improvements.

Related articles are currently not available for this article.