Competitions in AI -- Robustly Ranking Solvers Using Statistical Resampling
Chris Fawcett, Mauro Vallati, Holger H. Hoos, Alfonso E. Gerevini

TL;DR
This paper investigates the stability of AI solver competition rankings using statistical resampling, revealing their sensitivity to instance set variations and proposing a robust analysis method with confidence intervals.
Contribution
It introduces a novel resampling-based approach for statistically robust analysis of solver competition results, addressing issues of rank sensitivity and reproducibility.
Findings
Competition rankings are sensitive to small changes in benchmark sets.
Statistical ties and rank inversions are common in current competition results.
The proposed method provides confidence intervals and more reliable solver rankings.
Abstract
Solver competitions play a prominent role in assessing and advancing the state of the art for solving many problems in AI and beyond. Notably, in many areas of AI, competitions have had substantial impact in guiding research and applications for many years, and for a solver to be ranked highly in a competition carries considerable weight. But to which extent can we expect competition results to generalise to sets of problem instances different from those used in a particular competition? This is the question we investigate here, using statistical resampling techniques. We show that the rankings resulting from the standard interpretation of competition results can be very sensitive to even minor changes in the benchmark instance set used as the basis for assessment and can therefore not be expected to carry over to other samples from the same underlying instance distribution. To address…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Constraint Satisfaction and Optimization · Multi-Criteria Decision Making
