Machine Learning-Based Identification of Sickle Cell Disease Subphenotypes in Clinical Trial Data

This article has 0 evaluations Published on
Read the full article Related papers
This article on Sciety

Abstract

Sickle Cell Disease (SCD) is a rare autosomal recessive disorder caused by a point mutation producing abnormal hemoglobin S, leading to deformed red blood cells and a wide range of clinical manifestations, including pain crises, organ damage, and an increased risk of infection. These devastating complications often result in significant morbidity and early mortality, presenting significant therapeutic challenges. Currently, there is a lack of clinically validated predictive tools to assess individual SCD patients’ prognoses and therapeutic responses. This is largely due to the complexity and variability of the clinical manifestations, which vary widely among patients. As a result, there remains an unmet need for a systematic approach to SCD disease subphenotype classification that can guide and tailor therapeutic strategies, predict outcomes, and improve patients’ lives. Over a decade ago, two clinical subphenotypes in SCD were proposed based on literature and clinical observations. However, this concept has not been applied or explored in the design of clinical trials (CT). Recent advances in machine learning (ML) applications in medicine, and growing availability of SCD clinical trial data evaluating therapeutics which target different pathophysiologic aspects of the disease, provides opportunity to enhance understanding of therapeutic responses within SCD populations. Applying ML techniques to a large CT database could support development of robust disease models capable of identifying and validating disease subphenotypes, with potential to predict outcomes to specific therapies based on mechanism of action and to optimize care in SCD.

In this study, we constructed a comprehensive database comprising 3,551 patients with SCD from 16 clinical trials that supported therapeutic approvals for SCD. Using this database, we applied a machine learning pipeline to develop a rule-based classification method, which identified two distinct clinical subphenotypes of SCD: the Vaso-occlusive Primary (VP) subphenotype, primarily characterized by a higher frequency of vaso-occlusive pain crises, and the Hemolytic Dominant (HD) subphenotype, characterized by chronic hemolysis and its associated complications. Biomarker comparisons demonstrated that the VP subphenotype was associated with a significantly higher annual rate of vasoocclusive crisis events, significantly higher levels of total and fetal hemoglobin, and leukocytosis, while the HD subphenotype exhibited significantly higher levels of hemolysis-related biomarkers of indirect bilirubin. The biomarker profiles were validated using an independent clinical trial dataset, which confirmed these two subphenotypes in SCD.

Our study demonstrated that the integration of ML with disease pathophysiology enables robust identification of clinically meaningful subphenotypes of SCD from an international clinical trial database. This approach provides a basis for developing predictive disease models, which may optimize treatment strategies and improve patients’ outcomes. Further, our methodological framework offers a scalable model for application to identify subsets in other rare genetic diseases.

Related articles

Related articles are currently not available for this article.