Leveraging Large Language Models for Redundancy-Aware Pathway Analysis and Deep Biological Interpretation
Abstract
Extracting coherent, biologically meaningful insights from vast, complex multi-omics data remains challenging. Currently, pathway enrichment analysis serves as a cornerstone for the functional interpretation of such data. However, conventional approaches often suffer from extensive functional redundancy caused by shared molecular components and overlapping pathway definitions across databases. This redundancy can obscure key biological signals and compromise the interpretability of pathway enrichment results. Here, we present MAPA (Functional Module Identification and Annotation for Pathway Analysis Results Using Large Language Models [LLM]), an open-source computational framework that resolves redundancy and enhances pathway analysis result interpretation. MAPA computes functional similarity between pathways using LLM-based text embeddings, enabling comparison across different databases. It constructs pathway similarity networks and identifies functional modules via community detection algorithms. Crucially, MAPA employs LLMs for automated functional annotation, integrating Retrieval-Augmented Generation (RAG) to generate comprehensive and real-time biological summaries and reduce hallucinations. Benchmarking demonstrated MAPA’s superior performance: the biotext embedding similarity showed a large effect size (Cliff’s δ = 0.96) compared with the Jaccard index (δ = 0.73), and module identification achieved high accuracy (Adjusted Rand Index [ARI] = 0.95) versus existing methods (ARI = 0.23-0.33). Human expert evaluation confirmed that MAPA’s annotations match expert-quality interpretations. Finally, a multi-omics aging case study illustrates that MAPA uncovers coherent functional modules and generates insights extending beyond conventional pathway analyses. Collectively, MAPA represents a significant advance in redundancy-aware pathway analysis, transforming pathway enrichment results from fragmented lists into biologically coherent narratives. By leveraging the capabilities of LLMs, MAPA offers researchers a robust, scalable tool for deriving deep mechanistic insights from complex and vast multi-omics datasets, marking a new direction for AI-driven bioinformatics.
Related articles
Related articles are currently not available for this article.