Performance and Robustness of Machine Learning-based Radiomic COVID-19 Severity Prediction

Stephen S.F. Yip
Zan Klanecek
Shotaro Naganawa
John Kim
Andrej Studen
Luciano Rivetti
Robert Jeraj

1 evaluations Published on Sep 9, 2020

This article on Sciety

Abstract

Objectives

This study investigated the performance and robustness of radiomics in predicting COVID-19 severity in a large public cohort.

Methods

A public dataset of 1110 COVID-19 patients (1 CT/patient) was used. Using CTs and clinical data, each patient was classified into mild, moderate, and severe by two observers: (1) dataset provider and (2) a board-certified radiologist. For each CT, 107 radiomic features were extracted. The dataset was randomly divided into a training (60%) and holdout validation (40%) set. During training, features were selected and combined into a logistic regression model for predicting severe cases from mild and moderate cases. The models were trained and validated on the classifications by both observers. AUC quantified the predictive power of models. To determine model robustness, the trained models was cross-validated on the inter-observer’s classifications.

Results

A single feature alone was sufficient to predict mild from severe COVID-19 with <inline-formula> <alternatives> <inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="20189977v1_inline1.gif"/> </alternatives> </inline-formula> and <inline-formula> <alternatives> <inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="20189977v1_inline2.gif"/> </alternatives> </inline-formula> (p<< 0.01). The most predictive features were the distribution of small size-zones (GLSZM-SmallAreaEmphasis) for provider’s classification and linear dependency of neighboring voxels (GLCM-Correlation) for radiologist’s classification. Cross-validation showed that both <inline-formula> <alternatives> <inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="20189977v1_inline3.gif"/> </alternatives> </inline-formula> . In predicting moderate from severe COVID-19 , first-order-Median alone had sufficient predictive power of <inline-formula> <alternatives> <inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="20189977v1_inline4.gif"/> </alternatives> </inline-formula> . For radiologist’s classification, the predictive power of the model increased to <inline-formula> <alternatives> <inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="20189977v1_inline5.gif"/> </alternatives> </inline-formula> as the number of features grew from 1 to 5. Cross-validation yielded <inline-formula> <alternatives> <inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="20189977v1_inline6.gif"/> </alternatives> </inline-formula> and <inline-formula> <alternatives> <inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="20189977v1_inline7.gif"/> </alternatives> </inline-formula> .

Conclusions

Radiomics significantly predicted different levels of COVID-19 severity. The prediction was moderately sensitive to inter-observer classifications, and thus need to be used with caution.

Key points

Interpretable radiomic features can predict different levels of COVID-19 severity
Machine Learning-based radiomic models were moderately sensitive to inter-observer classifications, and thus need to be used with caution

Related articles are currently not available for this article.