Machine Learning-Based Identification of Survival-Associated CpG Biomarkers in Pancreatic Ductal Adenocarcinoma
Abstract
Pancreatic ductal adenocarcinoma (PDAC) is an exceptionally aggressive cancer with a 5-year survival rate of less than 10%, driven by late-stage diagnosis, limited treatment options, and a lack of reliable biomarkers for early detection and prognosis. In this study, we integrated DNA methylation data from TCGA and ICGC cohorts, categorizing samples based on survival time, and identified 684 differentially methylated CpG sites, along with 224 CpG biomarkers significantly associated with patient survival through statistical and machine learning-based analyses. We developed a random forest model to predict patient survival, achieving 85.2% accuracy for short-survival patients and 70.0% for long-survival patients in the validation set. External dataset validation further confirmed the model’s robustness and accuracy.De novomotif analysis of genomic regions surrounding the 224 CpG biomarkers identifiedTWIST1andFOXA2as key transcriptional regulators enriched in survival-associated CpG sites, linking their activity to patient survival outcomes. Collectively, our findings highlight valuable epigenetic biomarkers and provide a predictive model to assess PDAC risk levels post-surgery, offering the potential for improved patient stratification and personalized therapeutic strategies.
Related articles
Related articles are currently not available for this article.