Competitions in AI -- Robustly Ranking Solvers Using Statistical   Resampling

Chris Fawcett; Mauro Vallati; Holger H. Hoos; Alfonso E. Gerevini

arXiv:2308.05062·cs.AI·August 10, 2023·1 cites

Competitions in AI -- Robustly Ranking Solvers Using Statistical Resampling

Chris Fawcett, Mauro Vallati, Holger H. Hoos, Alfonso E. Gerevini

PDF

Open Access

TL;DR

This paper investigates the stability of AI solver competition rankings using statistical resampling, revealing their sensitivity to instance set variations and proposing a robust analysis method with confidence intervals.

Contribution

It introduces a novel resampling-based approach for statistically robust analysis of solver competition results, addressing issues of rank sensitivity and reproducibility.

Findings

01

Competition rankings are sensitive to small changes in benchmark sets.

02

Statistical ties and rank inversions are common in current competition results.

03

The proposed method provides confidence intervals and more reliable solver rankings.

Abstract

Solver competitions play a prominent role in assessing and advancing the state of the art for solving many problems in AI and beyond. Notably, in many areas of AI, competitions have had substantial impact in guiding research and applications for many years, and for a solver to be ranked highly in a competition carries considerable weight. But to which extent can we expect competition results to generalise to sets of problem instances different from those used in a particular competition? This is the question we investigate here, using statistical resampling techniques. We show that the rankings resulting from the standard interpretation of competition results can be very sensitive to even minor changes in the benchmark instance set used as the basis for assessment and can therefore not be expected to carry over to other samples from the same underlying instance distribution. To address…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference · Constraint Satisfaction and Optimization · Multi-Criteria Decision Making