Benchmarking of biochemical networks generated by large language models

This article has 3 evaluations Published on
Read the full article Related papers
This article on Sciety

Abstract

Computational models of biochemical networks provide frameworks for predicting how molecular cues guide cell decisions. These models are typically limited by the time-intensive manual curation required to extract network mechanisms from incomplete literature. Here, we test whether general-purpose large language models (LLMs) can generate accurate models of signaling and metabolic networks. We find that general-purpose LLMs generate 24-65% of the reactions of literature-curated signaling networks for cardiomyocyte hypertrophy, myofibroblast activation, and mechanosignaling. Further, logic-based models based on these networks predict responses to perturbations with accuracies of 6-33%. In the context of metabolic modeling, LLMs are able to generate 64-91% of the reactions within the core Escherichia coli metabolic network and demonstrate highly variable accuracies in predicting substrate utilization. Current general-purpose LLMs generate biochemical networks with moderate accuracy, and this study provides a pipeline and benchmarks to guide future improvements.

Related articles

Related articles are currently not available for this article.