TL;DR
This paper explores using Wikipedia access logs as a cost-effective, timely data source for monitoring and forecasting infectious diseases globally, demonstrating promising results with simple models.
Contribution
It introduces a novel approach leveraging Wikipedia data for disease monitoring, addressing challenges of breadth, peer review, and forecasting in social internet data-based methods.
Findings
Models achieved up to 0.92 in R-squared.
Forecasting was effective up to 28 days ahead.
Transferability of models between locations is feasible.
Abstract
Infectious disease is a leading threat to public health, economic stability, and other key social structures. Efforts to mitigate these impacts depend on accurate and timely monitoring to measure the risk and progress of disease. Traditional, biologically-focused monitoring techniques are accurate but costly and slow; in response, new techniques based on social internet data such as social media and search queries are emerging. These efforts are promising, but important challenges in the areas of scientific peer review, breadth of diseases and countries, and forecasting hamper their operational usefulness. We examine a freely available, open data source for this use: access logs from the online encyclopedia Wikipedia. Using linear models, language as a proxy for location, and a systematic yet simple article selection procedure, we tested 14 location-disease combinations and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
