Leveraging Large Language Models for Redundancy-Aware Pathway Analysis and Deep Biological Interpretation

Yifei Ge
Feifan Zhang
Yijiang Liu
Chao Jiang
Peng Gao
Nguan Soon Tan
Sai Zhang
Yuchen Shen
Qianyi Zhou
Xin Zhou
Chuchu Wang
Xiaotao Shen

0 evaluations Published on Aug 28, 2025

This article on Sciety

Abstract

Extracting coherent, biologically meaningful insights from vast, complex multi-omics data remains challenging. Currently, pathway enrichment analysis serves as a cornerstone for the functional interpretation of such data. However, conventional approaches often suffer from extensive functional redundancy caused by shared molecular components and overlapping pathway definitions across databases. This redundancy can obscure key biological signals and compromise the interpretability of pathway enrichment results. Here, we present MAPA (Functional M odule Identification and A nnotation for P athway A nalysis Results Using Large Language Models [LLM]), an open-source computational framework that resolves redundancy and enhances pathway analysis result interpretation. MAPA computes functional similarity between pathways using LLM-based text embeddings, enabling comparison across different databases. It constructs pathway similarity networks and identifies functional modules via community detection algorithms. Crucially, MAPA employs LLMs for automated functional annotation, integrating Retrieval-Augmented Generation (RAG) to generate comprehensive and real-time biological summaries and reduce hallucinations. Benchmarking demonstrated MAPA’s superior performance: the biotext embedding similarity showed a large effect size (Cliff’s δ = 0.96) compared with the Jaccard index (δ = 0.73), and module identification achieved high accuracy (Adjusted Rand Index [ARI] = 0.95) versus existing methods (ARI = 0.23-0.33). Human expert evaluation confirmed that MAPA’s annotations match expert-quality interpretations. Finally, a multi-omics aging case study illustrates that MAPA uncovers coherent functional modules and generates insights extending beyond conventional pathway analyses. Collectively, MAPA represents a significant advance in redundancy-aware pathway analysis, transforming pathway enrichment results from fragmented lists into biologically coherent narratives. By leveraging the capabilities of LLMs, MAPA offers researchers a robust, scalable tool for deriving deep mechanistic insights from complex and vast multi-omics datasets, marking a new direction for AI-driven bioinformatics.

Related articles are currently not available for this article.