Enhanced COVID-19 data for improved prediction of survival
Abstract
The current COVID-19 pandemic, caused by the rapid world-wide spread of the SARS-CoV-2 virus, is having severe consequences for human health and the world economy. The virus effects individuals quite differently, with many infected patients showing only mild symptoms, and others showing critical illness. To lessen the impact of the pandemic, one important question is which factors predict the death of a patient? Here, we construct an enhanced COVID-19 dataset by processing two existing databases (from Kaggle and WHO) and using natural language processing methods to enhance the data by adding local weather conditions and research sentiment.
Author summary
In this study, we contribute an enhanced COVID-19 dataset, which contains 183 samples and 43 features. Application of Extreme Gradient Boosting (XGBoost) on the enhanced dataset achieves 95% accuracy in predicting patients survival, with country-wise research sentiment, and then age and local weather, showing the most importance. All data and source code are available at <ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="http://ab.inf.uni-tuebingen.de/publications/papers/COVID-19">http://ab.inf.uni-tuebingen.de/publications/papers/COVID-19</ext-link>.
Related articles
Related articles are currently not available for this article.