A Supervised Learning Approach to Classify Clinical Outcomes in Helicobacter pylori-positive Patients

This article has 0 evaluations Published on
Read the full article Related papers
This article on Sciety

Abstract

Background

Helicobacter pylori (H. pylori) is a gastric pathogen with global prevalence that is implicated in a lot of clinical outcomes, ranging from gastritis to gastric cancer. While chronic gastritis is a nearly universal outcome among infected individuals, only a small subset progresses to severe diseases like gastric cancer, with these outcomes being governed by a mixture of bacterial, host and environmental factors. This study presents a supervised learning framework in order to predict whether an H. pylori infection will result in gastric cancer or a non-malignant outcome with a combination of clinical and genome-derived features.

Methods

A dataset with 1,363 H. pylori genomes containing host metadata and annotated genomic features was curated. Feature extraction was performed using gene presence/absence profiles, various sequence descriptors, and variant-level annotation features. A white-box logistic regression model and black-box eXtreme Gradient Boosting (XGBoost) classifier and Random Forest models were trained and Synthetic Minority Over-sampling Technique for Nominal and Continuous (SMOTE-NC) was used for addressing class imbalance. SHapley Additive exPlanations (SHAP) values were used to assess model interpretability.

Results

The baseline model showed an Area Under Receiver Operating Characteristic curve (AUROC) of 0.89 while the XGBoost model and Random Forest models displayed an AUROC of 0.95. The black-box models improved recall for the gastric cancer group with 0.83 for XGBoost and 0.82 for Random Forest while the baseline model had a recall of 0.73. Feature importance analyses were also used to identify the key drivers of gastric cancer prediction including host age, geographic origin and specific disruptive mutations. These findings support the use of machine learning in biomarker-guided decision-making in assessing gastric disease risk.

Conclusions

The black-box models demonstrated improved performance with a recall value greater than 0.8 compared to the baseline model. Age was implicated as the top predictor in gastric cancer development across all models, along with a variety of genomic-level features. Further analysis of contributing features to check for causation could expand current research on the links between H. pylori and gastric cancer.

Related articles

Related articles are currently not available for this article.