Integrative Transcriptomic and Machine Learning Analysis Reveals a Robust Gene Signature in Parkinson’s Disease
Abstract
Parkinson’s disease (PD) is a progressive neurodegenerative disorder characterized by dopaminergic neuron loss in the substantia nigra and the accumulation of misfolded α-synuclein, leading to motor and non-motor symptoms. Early and accurate diagnosis remains challenging due to the gradual onset of symptoms and disease heterogeneity. In this study, we combined brain-region–specific transcriptomic profiling with interpretable machine learning to identify robust and biologically meaningful predictive gene signatures for PD. Differential gene expression analysis of post-mortem PD and control brain samples revealed a set of significantly dysregulated genes, predominantly involved in mitochondrial function, synaptic signaling, and calcium-mediated neuronal processes. Using Random Forest, Logistic Regression, and XGBoost classifiers, we derived a core set of seven overlapping genes (CALM1, DCLK1, FGF13, HMGN2, PRKACB, SV2C, TAC1) that consistently contributed to PD classification across models. Feature importance and model interpretability analyses highlighted these genes as key drivers of predictive performance. The final gene set achieved robust classification with cross-validated ROC-AUC of 0.845 and was further validated using an independent external dataset (ROC-AUC = 0.743), demonstrating generalizability across cohorts. Functional enrichment analysis linked these genes to neuronal signaling, synaptic function, and Parkinson’s disease–relevant pathways, providing mechanistic context to their predictive power. Overall, our integrative approach illustrates the potential of combining transcriptomics with explainable machine learning to generate reliable, interpretable molecular biomarkers for PD, which may facilitate early diagnosis and improve understanding of disease biology.
Related articles
Related articles are currently not available for this article.