A machine learning methodology for real-time forecasting of the 2019-2020 COVID-19 outbreak using Internet searches, news alerts, and estimates from mechanistic models
Dianbo Liu, Leonardo Clemente, Canelle Poirier, Xiyu Ding, Matteo, Chinazzi, Jessica T Davis, Alessandro Vespignani, Mauricio Santillana

TL;DR
This paper introduces a machine learning approach that combines mechanistic model estimates with digital data sources to forecast COVID-19 activity in Chinese provinces two days in advance, demonstrating high accuracy and potential for broader application.
Contribution
The study presents a novel, interpretable machine learning methodology that integrates diverse data sources and geo-spatial clustering to improve real-time COVID-19 forecasting accuracy.
Findings
Model outperforms baseline models in 27 of 32 provinces
Uses diverse data sources including health reports, internet searches, news, and mechanistic forecasts
Enables reliable two-day-ahead COVID-19 activity predictions
Abstract
We present a timely and novel methodology that combines disease estimates from mechanistic models with digital traces, via interpretable machine-learning methodologies, to reliably forecast COVID-19 activity in Chinese provinces in real-time. Specifically, our method is able to produce stable and accurate forecasts 2 days ahead of current time, and uses as inputs (a) official health reports from Chinese Center Disease for Control and Prevention (China CDC), (b) COVID-19-related internet search activity from Baidu, (c) news media activity reported by Media Cloud, and (d) daily forecasts of COVID-19 activity from GLEAM, an agent-based mechanistic model. Our machine-learning methodology uses a clustering technique that enables the exploitation of geo-spatial synchronicities of COVID-19 activity across Chinese provinces, and a data augmentation technique to deal with the small number of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 epidemiological studies · Data-Driven Disease Surveillance · Anomaly Detection Techniques and Applications
