Beyond Arrow: From Impossibility to Possibilities in Multi-Criteria Benchmarking

Polina Gordienko; Christoph Jansen; Julian Rodemann; Georg Schollmeyer

arXiv:2602.07593·cs.LG·February 10, 2026

Beyond Arrow: From Impossibility to Possibilities in Multi-Criteria Benchmarking

Polina Gordienko, Christoph Jansen, Julian Rodemann, Georg Schollmeyer

PDF

Open Access

TL;DR

This paper models multi-criteria benchmarking as a social choice problem, identifying conditions under which aggregation of multiple metrics into a stable, meaningful ranking is feasible, moving beyond traditional impossibility results.

Contribution

It formalizes multi-criteria benchmarking as a social choice problem and identifies conditions that enable stable aggregation of metrics, overcoming prior impossibility limitations.

Findings

01

Certain preference structures allow stable aggregation of metrics.

02

Empirical analysis shows real benchmarks often meet these structural conditions.

03

The approach enables meaningful multi-criteria model rankings.

Abstract

Modern benchmarks such as HELM MMLU account for multiple metrics like accuracy, robustness and efficiency. When trying to turn these metrics into a single ranking, natural aggregation procedures can become incoherent or unstable to changes in the model set. We formalize this aggregation as a social choice problem where each metric induces a preference ranking over models on each dataset, and a benchmark operator aggregates these votes across metrics. While prior work has focused on Arrow's impossibility result, we argue that the impossibility often originates from pathological examples and identify sufficient conditions under which these disappear, and meaningful multi-criteria benchmarking becomes possible. In particular, we deal with three restrictions on the combinations of rankings and prove that on single-peaked, group-separable and distance-restricted preferences, the benchmark…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGame Theory and Voting Systems · Ethics and Social Impacts of AI · Mobile Crowdsensing and Crowdsourcing