An Integrated Framework with Machine Learning and Radiomics for Accurate and Rapid Early Diagnosis of COVID-19 from Chest X-ray
Abstract
Early diagnosis of COVID-19 is considered the first key action to prevent spread of the virus. Currently, reverse transcription-polymerase chain reaction (RT-PCR) is considered as a gold standard point-of-care diagnostic tool. However, several limitations of RT-PCR have been identified, e.g., low sensitivity, cost, long delay in getting results and the need of a professional technician to collect samples. On the other hand, chest X-ray (CXR) is routinely used as a cost-effective diagnostic test for diagnosis and monitoring different respiratory abnormalities and is currently being used as a discriminating tool for COVID-19. However, visual assessment of CXR is not able to distinguish COVID-19 from other lung conditions. Several machine learning algorithms have been proposed to detect COVID-19 directly from CXR images with reasonably good accuracy on a data set that was randomly split into two subsets for training and test. Since these methods require a huge number of images for training, data augmentation with geometric transformation was applied to increase the number of images. It is highly likely that the images of the same patients are present in both the training and test sets resulting in higher accuracies in detection of COVID-19. It is, therefore, vital to assess the performance of COVID-19 detection algorithm on an independent data set with different degrees of the disease before being employed for clinical settings. On the other hand, machine learning techniques that depend on handcrafted features extraction and selection approaches can be trained with smaller data set. The features can also be analyzed separately for various lung conditions. Radiomics features are such kind of handcrafted features that represent heterogeneous appearance of the lung on CXR quantitatively and can be used to distinguish COVID-19 from other lung conditions. Based on this hypothesis, a machine learning based technique is proposed here that is trained on a set of suitable radiomics features (71 features) to detect COVID-19. It is found that Support Vector Machine (SVM) and Ensemble Bagging Model Trees (EBM) trained on these 71 radiomics features can distinguish between COVID-19 and other diseases with an overall sensitivity of 99.6% and 87.8% and specificity of 85% and 97% respectively. Though the performance is comparable for both methods, EBM is more robust across severity levels. Severity, in this case, was scored between 0 to 4 by two experienced radiologists for each lung segment of each CXR image represents the degree of severity of the disease. For the case of 0 severity, sensitivity and specificity of the EBM method are 91.7% and 100% respectively indicating that there are certain radiomics pattern that are not visibly distinguishable. Since the proposed method does not require any manual intervention (e.g., sample collection etc.), it can be integrated with any standard X-ray reporting system to be used as an efficient, cost-effective and rapid early diagnosis device. It can also be deployed in places where quick results of the COVID-19 test are required, e.g., airports, seaports, hospitals, health clinics, etc.
Related articles
Related articles are currently not available for this article.