A Generalizable Data Assembly Algorithm for Infectious Disease Outbreaks

This article has 1 evaluations Published on
Read the full article Related papers
This article on Sciety

Abstract

Background & Objective

During infectious disease outbreaks, health agencies often share text-based information about cases and deaths. This information is usually text-based and rarely machine-readable, thus creating challenges for outbreak researchers. Here, we introduce a generalizable data assembly algorithm that automatically curates text-based, outbreak-related information and demonstrate its performance across three outbreaks.

Methods

After developing an algorithm with regular expressions, we automatically curated data from health agencies via three information sources: formal reports, email newsletters, and Twitter. A validation data set was also curated manually for each outbreak.

Findings

When compared against the validation data sets, the overall cumulative missingness and misidentification of the algorithmically curated data were ≤2% and ≤1%, respectively, for all three outbreaks.

Conclusions

Within the context of outbreak research, our work successfully addresses the need for generalizable tools that can transform text-based information into machine-readable data across varied information sources and infectious diseases.

Related articles

Related articles are currently not available for this article.