Eliciting Disease Data from Wikipedia Articles
Geoffrey Fairchild (1, 3), Lalindra De Silva (2), Sara Y. Del Valle, (1), Alberto M. Segre (3) ((1) Los Alamos National Laboratory, Los Alamos,, NM, USA, (2) The University of Utah, Salt Lake City, UT, USA, (3) The, University of Iowa, Iowa City, IA, USA)

TL;DR
This paper demonstrates how Wikipedia articles can be used to extract real-time disease data through named-entity recognition, creating a community-driven system for disease monitoring and data sharing.
Contribution
It introduces a method to extract structured disease data from Wikipedia articles using a trained named-entity recognizer, filling a gap in existing internet-based surveillance systems.
Findings
Named-entity recognizer achieved an F1 score of 0.753.
Time series data from Wikipedia closely matched ground truth during Ebola outbreak.
Wikipedia can serve as a community-driven open-source disease data repository.
Abstract
Traditional disease surveillance systems suffer from several disadvantages, including reporting lags and antiquated technology, that have caused a movement towards internet-based disease surveillance systems. Internet systems are particularly attractive for disease outbreaks because they can provide data in near real-time and can be verified by individuals around the globe. However, most existing systems have focused on disease monitoring and do not provide a data repository for policy makers or researchers. In order to fill this gap, we analyzed Wikipedia article content. We demonstrate how a named-entity recognizer can be trained to tag case counts, death counts, and hospitalization counts in the article narrative that achieves an F1 score of 0.753. We also show, using the 2014 West African Ebola virus disease epidemic article as a case study, that there are detailed time series…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
