Benchmarking of signaling networks generated by large language models
Abstract
Computational models of signaling networks provide frameworks for predicting how molecular cues guide cell decisions. But they are typically limited by manual curation from incomplete literature. Here, we test whether general-purpose large language models (LLMs) generate accurate models of signaling networks. We find that general purpose LLMs generate 24-58% of the reactions of literature-curated networks for cardiomyocyte hypertrophy, myofibroblast activation, and mechano-signaling, and predicting network responses to perturbations with accuracies of 5-26%. While current general-purpose LLMs generate signaling networks with limited accuracy, this study provides a pipeline and benchmarks to guide future improvements.
Related articles
Related articles are currently not available for this article.