Understand research hotspots surrounding COVID-19 and other coronavirus infections using topic modeling
Abstract
Background
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a virus that causes severe respiratory illness in humans, which results in global outbreak of novel coronavirus disease (COVID-19) currently. This study aimed to evaluate the characteristics of publications involving coronaviruses as well as COVID-19 by using topic modeling.
Methods
We extracted all abstracts and retained the most informative words from the COVID-19 Open Research Dataset, which contains 35,092 pieces of coronavirus related literature published up to March 20, 2020. Using Latent Dirichlet Allocation modeling, we trained a topic model from the corpus, analyzed the semantic relationships between topics and compared the topic distribution between COVID-19 and other CoV infections.
Results
Eight topics emerged overall: clinical characterization, pathogenesis research, therapeutics research, epidemiological study, virus transmission, vaccines research, virus diagnostics, and viral genomics. It was observed that current COVID-19 research puts more emphasis on clinical characterization, epidemiological study, and virus transmission. In contrast, topics about diagnostics, therapeutics, vaccines, genomics and pathogenesis only account for less than 10% or even 4% of all the COVID-19 publications, much lower than those of other CoV infections.
Conclusions
These results identified knowledge gaps in the area of COVID-19 and offered directions for future research.
Related articles
Related articles are currently not available for this article.