A Machine Learning Explanation of Incidence Inequalities of SARS-CoV-2 Across 88 Days in 157 Countries
Abstract
Because the SARS-CoV-2 (COVID-19) pandemic viral outbreaks will likely continue until effective vaccines are widely administered, (1) new capabilities to accurately predict incidence rates by location and time to know in advance the disease burden and specific needs for any given population are valuable to minimize morbidity and mortality. In this study, a random forest of 9,250 regression trees was applied to 6,941 observations of 13 statistically significant predictor variables targeting SARS-CoV-2 incidence rates per 100,000 across 88 days in 157 countries. One key finding is an algorithm that can predict the incidence rate per day of a SARS-CoV-2 epidemic cycle with a pseudo-R2 accuracy of 98.5% and explain 97.4% of the variances. Another key finding is the relative importance of 13 demographic, economic, environmental, and public health modulators to the SARS-CoV-2 incidence rate. Four factors proposed in earlier research as potential modulators have no statistically significant relationship with incidence rates (2)(3). These findings give leaders new capabilities for improved capacity planning and targeting stay-at-home interventions and prioritizing programming by knowing the atypical social determinants that are the root causes of SARS-CoV-2 incidence variance. This work also proves that machine learning can accurately and quickly explain disease dynamics for zoonoses with pandemic potential.
Related articles
Related articles are currently not available for this article.