An extended note on the multibin logarithmic score used in the FluSight competitions
Johannes Bracher

TL;DR
This paper examines the multibin logarithmic score used in CDC's FluSight influenza forecasting competitions, highlighting its non-proper nature and potential to incentivize dishonest forecasts, with analysis based on 2016/17 competition data.
Contribution
It critically analyzes the multibin logarithmic score's properties and practical implications, revealing issues with its non-properness in influenza forecasting evaluations.
Findings
Multibin score is not a proper scoring rule.
Potential for encouraging dishonest forecasts.
Analysis based on 2016/17 FluSight data.
Abstract
In recent years the Centers for Disease Control and Prevention (CDC) have organized FluSight influenza forecasting competitions. To evaluate the participants' forecasts a multibin logarithmic score has been created, which is a non-standard variant of the established logarithmic score. Unlike the original log score, the multibin version is not proper and may thus encourage dishonest forecasting. We explore the practical consequences this may have, using forecasts from the 2016/17 FluSight competition for illustration.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData-Driven Disease Surveillance · COVID-19 epidemiological studies · Influenza Virus Research Studies
