Automated Longitudinal MRI Assessments of Diffuse Midline Gliomas: A Large Multi-Institutional Study
Abstract
Background Diffuse midline gliomas (DMGs) are highly aggressive pediatric and young adults brain tumors with poor prognosis and limited treatment options. Accurate imaging assessment of tumor progression is critical for evaluating therapeutic response and guiding clinical decisions, but it is subject to reader bias. This study aims to compare two automated deep learning (DL) based strategies for longitudinal segmentation and subsequent classification of tumor progression in DMG. Methods We retrospectively analyzed longitudinal imaging from 155 DMG patients of UCSF hospitals and an external cohort of 18 patients from the Pediatric Neuro-Oncology Consortium. Two publicly available DL models were evaluated: (1) a “longitudinal” model trained to directly segment areas of change from two consecutive time points, and (2) an “independent” state-of-the-art volumetric segmentation model that processes each time point individually and then computes the change mask between the independent segmentation of consecutive time points. Labels were derived from a manual review of radiology reports and categorized as “Increased,” “Decreased,” or “Stable” tumor size based on the interpreting neuroradiologist’s assessment. Reference standard segmentations of change volumes were manually created in a subset of patients. Model segmentation outputs were compared using Dice score, volumetric mean absolute error (MAE), and volume coefficient of determination (R²) in internal and external sets. For comparison to radiologist assessment, classification accuracy, sensitivity, specificity, and Area Under the ROC Curve (ROC AUC) were assessed. Findings The longitudinal model demonstrated superior classification performance with ROC AUCs of 0.93, 0.90, and 0.83 for the prediction of increased, decreased, and stable tumor change classes, respectively in the internal set. It achieved a mean absolute error (MAE) of 7.41 ± 23.04 cm³ and R² of 0.25 internally, and 8.27 ± 16.63 cm³ and R² of 0.53 externally. Median Dice scores were 0.76 (IQR: 0.50–0.88) and 0.68 (IQR: 0.41–0.92) in the internal and external datasets, respectively. In comparison, the independent model showed lower classification ROC AUCs (0.83, 0.89, and 0.80), higher MAEs (11.03 ± 21.24 cm³ internally; 10.86 ± 17.51 cm³ externally), and lower median Dice scores (0.40 [IQR: 0.25–0.59] internally; 0.62 [IQR: 0.33–0.83] externally). Interpretation For assessing changes in DMG tumor size, the longitudinal DL model generally outperforms the independent model in internal and external cohorts. These findings support integrating dedicated longitudinal-based AI tools for more objective and reproducible tumor assessments.
Related articles
Related articles are currently not available for this article.