Learning to Identify Top Elo Ratings: A Dueling Bandits Approach
Xue Yan, Yali Du, Binxin Ru, Jun Wang, Haifeng Zhang, Xu Chen

TL;DR
This paper introduces a dueling bandits-based online scheduling algorithm to efficiently estimate top Elo ratings, significantly reducing the number of matches needed and improving convergence speed in competitive gaming and AI evaluation.
Contribution
It develops a novel bandit framework tailored for Elo rating estimation, achieving constant per-step complexity and sublinear regret, with extensions to multidimensional ratings for intransitive games.
Findings
Reduces sample complexity for top Elo estimation.
Achieves faster convergence and better efficiency in experiments.
Extends to multidimensional Elo ratings for complex games.
Abstract
The Elo rating system is widely adopted to evaluate the skills of (chess) game and sports players. Recently it has been also integrated into machine learning algorithms in evaluating the performance of computerised AI agents. However, an accurate estimation of the Elo rating (for the top players) often requires many rounds of competitions, which can be expensive to carry out. In this paper, to improve the sample efficiency of the Elo evaluation (for top players), we propose an efficient online match scheduling algorithm. Specifically, we identify and match the top players through a dueling bandits framework and tailor the bandit algorithm to the gradient-based update of Elo. We show that it reduces the per-step memory and time complexity to constant, compared to the traditional likelihood maximization approaches requiring time. Our algorithm has a regret guarantee of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSports Analytics and Performance · Artificial Intelligence in Games · Advanced Bandit Algorithms Research
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
