Evaluating Agents using Social Choice Theory
Marc Lanctot, Kate Larson, Yoram Bachrach, Luke Marris, Zun Li, Avishkar Bhoopchand, Thomas Anthony, Brian Tanner, Anna Koop

TL;DR
This paper introduces a novel evaluation framework called Voting-as-Evaluation (VasE) that applies social choice theory to assess agents across various domains, offering interpretability, robustness, and axiomatic foundations.
Contribution
It proposes a new evaluation method based on voting theory, leveraging social welfare functions to improve robustness and interpretability in agent evaluation.
Findings
VasE outperforms Elo and Nash averaging in robustness.
Discoveries of properties in evaluation data not visible through scores.
Maximal lotteries satisfy key consistency and computational efficiency properties.
Abstract
We argue that many general evaluation problems can be viewed through the lens of voting theory. Each task is interpreted as a separate voter, which requires only ordinal rankings or pairwise comparisons of agents to produce an overall evaluation. By viewing the aggregator as a social welfare function, we are able to leverage centuries of research in social choice theory to derive principled evaluation frameworks with axiomatic foundations. These evaluations are interpretable and flexible, while avoiding many of the problems currently facing cross-task evaluation. We apply this Voting-as-Evaluation (VasE) framework across multiple settings, including reinforcement learning, large language models, and humans. In practice, we observe that VasE can be more robust than popular evaluation frameworks (Elo and Nash averaging), discovers properties in the evaluation data not evident from scores…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGame Theory and Voting Systems · Experimental Behavioral Economics Studies · Sports Analytics and Performance
