Seasonal Web Search Query Selection for Influenza-Like Illness (ILI) Estimation
Niels Dalum Hansen, K{\aa}re M{\o}lbak, Ingemar J. Cox and, Christina Lioma

TL;DR
This paper improves influenza-like illness estimation from web search data by modeling seasonal variations and selecting queries correlated with residuals, reducing spurious seasonal correlations and enhancing accuracy.
Contribution
It introduces a method to account for seasonality in query selection, improving ILI estimation accuracy from web search logs.
Findings
Re-ranking queries based on residual correlation enhances ILI relevance.
The seasonal residual modeling reduces spurious correlations.
Experimental results show improved ILI estimation performance.
Abstract
Influenza-like illness (ILI) estimation from web search data is an important web analytics task. The basic idea is to use the frequencies of queries in web search logs that are correlated with past ILI activity as features when estimating current ILI activity. It has been noted that since influenza is seasonal, this approach can lead to spurious correlations with features/queries that also exhibit seasonality, but have no relationship with ILI. Spurious correlations can, in turn, degrade performance. To address this issue, we propose modeling the seasonal variation in ILI activity and selecting queries that are correlated with the residual of the seasonal model and the observed ILI signal. Experimental results show that re-ranking queries obtained by Google Correlate based on their correlation with the residual strongly favours ILI-related queries.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
