How to Host a Data Competition: Statistical Advice for Design and Analysis of a Data Competition
Christine M. Anderson-Cook, Kary L. Myers, Lu Lu, Michael L. Fugate,, Kevin R. Quinlan, Norma Pawley

TL;DR
This paper offers strategic guidance on designing, analyzing, and interpreting data competitions to improve learning outcomes, prevent overfitting, and gain deeper insights into algorithm performance beyond leaderboard rankings.
Contribution
It introduces a comprehensive framework for competition design and post-competition analysis using exploratory data analysis and generalized linear models, enhancing understanding of algorithm strengths and weaknesses.
Findings
Effective competition design reduces overfitting risks.
Post-competition analysis reveals detailed algorithm performance insights.
Richer summaries improve interpretation beyond leaderboard rankings.
Abstract
Data competitions rely on real-time leaderboards to rank competitor entries and stimulate algorithm improvement. While such competitions have become quite popular and prevalent, particularly in supervised learning formats, their implementations by the host are highly variable. Without careful planning, a supervised learning competition is vulnerable to overfitting, where the winning solutions are so closely tuned to the particular set of provided data that they cannot generalize to the underlying problem of interest to the host. This paper outlines some important considerations for strategically designing relevant and informative data sets to maximize the learning outcome from hosting a competition based on our experience. It also describes a post-competition analysis that enables robust and efficient assessment of the strengths and weaknesses of solutions from different competitors, as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
