Soft Condorcet Optimization for Ranking of General Agents
Marc Lanctot, Kate Larson, Michael Kaisers, Quentin Berthet, Ian Gemp, Manfred Diaz, Roberto-Rafael Maura-Rivero, Yoram Bachrach, Anna Koop, Doina Precup

TL;DR
This paper introduces Soft Condorcet Optimization (SCO), a novel ranking method for general AI agents that optimally aggregates performance data across diverse tasks, outperforming traditional methods like Elo in accuracy.
Contribution
The paper proposes SCO, a new ranking scheme inspired by social choice theory, with algorithms and empirical validation demonstrating its effectiveness over existing methods.
Findings
SCO rankings are close to the optimal Kemeny-Young ranking in preference profiles.
SCO accurately approximates ground truth rankings in noisy tournament simulations.
SCO outperforms baselines in ranking human players in the game Diplomacy.
Abstract
Driving progress of AI models and agents requires comparing their performance on standardized benchmarks; for general agents, individual performances must be aggregated across a potentially wide variety of different tasks. In this paper, we describe a novel ranking scheme inspired by social choice frameworks, called Soft Condorcet Optimization (SCO), to compute the optimal ranking of agents: the one that makes the fewest mistakes in predicting the agent comparisons in the evaluation data. This optimal ranking is the maximum likelihood estimate when evaluation data (which we view as votes) are interpreted as noisy samples from a ground truth ranking, a solution to Condorcet's original voting system criteria. SCO ratings are maximal for Condorcet winners when they exist, which we show is not necessarily true for the classical rating system Elo. We propose three optimization algorithms to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Criteria Decision Making · Auction Theory and Applications · Advanced Algebra and Logic
