The Ladder: A Reliable Leaderboard for Machine Learning Competitions
Avrim Blum, Moritz Hardt

TL;DR
This paper introduces 'the Ladder,' a new algorithm for maintaining a reliable leaderboard in machine learning competitions, offering strong theoretical guarantees, robustness against attacks, and practical deployment without tuning.
Contribution
The paper presents the Ladder algorithm, providing the first strong theoretical guarantees for adaptive leaderboard accuracy in machine learning competitions.
Findings
Supports strong theoretical guarantees in adaptive settings
Resists practical adversarial attacks
Achieves high utility on real Kaggle competition data
Abstract
The organizer of a machine learning competition faces the problem of maintaining an accurate leaderboard that faithfully represents the quality of the best submission of each competing team. What makes this estimation problem particularly challenging is its sequential and adaptive nature. As participants are allowed to repeatedly evaluate their submissions on the leaderboard, they may begin to overfit to the holdout data that supports the leaderboard. Few theoretical results give actionable advice on how to design a reliable leaderboard. Existing approaches therefore often resort to poorly understood heuristics such as limiting the bit precision of answers and the rate of re-submission. In this work, we introduce a notion of "leaderboard accuracy" tailored to the format of a competition. We introduce a natural algorithm called "the Ladder" and demonstrate that it simultaneously…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Privacy-Preserving Technologies in Data
