Elo Ratings for Large Tournaments of Software Agents in Asymmetric Games
Ben Wise

TL;DR
This paper proposes a revised Elo rating system tailored for large tournaments of AI agents in asymmetric games, addressing differences from human players and complex game setups.
Contribution
It introduces modifications to the Elo system to better evaluate AI agents in asymmetric and complex game environments, with guidelines for tournament implementation.
Findings
Revised Elo system accommodates AI training and game asymmetry
Guidelines improve fairness and accuracy in AI tournament ratings
Addresses challenges of large-scale AI evaluation in complex games
Abstract
The Elo rating system has been used world wide for individual sports and team sports, as exemplified by the European Go Federation (EGF), International Chess Federation (FIDE), International Federation of Association Football (FIFA), and many others. To evaluate the performance of artificial intelligence agents, it is natural to evaluate them on the same Elo scale as humans, such as the rating of 5185 attributed to AlphaGo Zero. There are several fundamental differences between humans and AI that suggest modifications to the system, which in turn require revisiting Elo's fundamental rationale. AI is typically trained on many more games than humans play, and we have little a-priori information on newly created AI agents. Further, AI is being extended into games which are asymmetric between the players, and which could even have large complex boards with different setup in every game,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSports Analytics and Performance · Educational Games and Gamification · Artificial Intelligence in Games
