Predicting Inpatient Risk of Mortality in Diabetic Patients Using Administrative Data and Machine Learning: An External Validation Study Using SPARCS
Abstract
Objectives
To evaluate whether machine learning models trained solely on administrative and demographic data can predict inpatient APR Risk of Mortality in diabetic patients.
Design
Retrospective cohort study using New York State SPARCS data from 2021 and 2022.
Setting
New York Statewide Planning and Research Cooperative System (SPARCS) data from 2021 and 2022.
Participants
Adult inpatient admissions (age ≥18) with a diagnosis of diabetes mellitus.
Primary outcome measure
APR-DRG Risk of Mortality (ROM), classified as Minor, Moderate, Major, or Extreme.
Results
XGBoost outperformed logistic regression and random forest across all metrics. On the 2022 validation set, XGBoost achieved the highest accuracy (46.5%), macro AUC (0.699), weighted F1-score (0.458), and the lowest Brier score for the Extreme class (0.052). SHAP analysis identified length of stay, age group, and payer type as key predictors.
Conclusions
Even without clinical data, administrative features contain non-random signals relevant for mortality risk stratification. These models, especially XGBoost, may help hospitals flag high-risk patients early using routinely available data, aiding triage and planning before labs or vitals are available.
Strengths and limitations of this study
This study is one of the first to apply machine learning to publicly available SPARCS data to predict APR-DRG Risk of Mortality in diabetic inpatients.
We evaluated three models using temporally distinct training and validation cohorts, simulating real-world model deployment across calendar years.
Model interpretability was addressed using SHAP, providing transparent insights into feature contributions and enabling clinician-facing explanation.
The models relied solely on administrative and demographic data, limiting predictive fidelity due to the absence of clinical features such as laboratory values or vital signs.
Risk of Mortality labels were derived from APR-DRG software and may be influenced by coding practices rather than objective clinical outcomes.
Related articles
Related articles are currently not available for this article.