Beyond Arrow: From Impossibility to Possibilities in Multi-Criteria Benchmarking
Polina Gordienko, Christoph Jansen, Julian Rodemann, Georg Schollmeyer

TL;DR
This paper models multi-criteria benchmarking as a social choice problem, identifying conditions under which aggregation of multiple metrics into a stable, meaningful ranking is feasible, moving beyond traditional impossibility results.
Contribution
It formalizes multi-criteria benchmarking as a social choice problem and identifies conditions that enable stable aggregation of metrics, overcoming prior impossibility limitations.
Findings
Certain preference structures allow stable aggregation of metrics.
Empirical analysis shows real benchmarks often meet these structural conditions.
The approach enables meaningful multi-criteria model rankings.
Abstract
Modern benchmarks such as HELM MMLU account for multiple metrics like accuracy, robustness and efficiency. When trying to turn these metrics into a single ranking, natural aggregation procedures can become incoherent or unstable to changes in the model set. We formalize this aggregation as a social choice problem where each metric induces a preference ranking over models on each dataset, and a benchmark operator aggregates these votes across metrics. While prior work has focused on Arrow's impossibility result, we argue that the impossibility often originates from pathological examples and identify sufficient conditions under which these disappear, and meaningful multi-criteria benchmarking becomes possible. In particular, we deal with three restrictions on the combinations of rankings and prove that on single-peaked, group-separable and distance-restricted preferences, the benchmark…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGame Theory and Voting Systems · Ethics and Social Impacts of AI · Mobile Crowdsensing and Crowdsourcing
