On the Limitations of Elo: Real-World Games, are Transitive, not   Additive

Quentin Bertrand; Wojciech Marian Czarnecki; Gauthier Gidel

arXiv:2206.12301·cs.GT·March 8, 2023

On the Limitations of Elo: Real-World Games, are Transitive, not Additive

Quentin Bertrand, Wojciech Marian Czarnecki, Gauthier Gidel

PDF

Open Access 1 Repo

TL;DR

This paper examines the limitations of Elo ratings in real-world games, demonstrating their failure to capture transitive strength and proposing an extended scoring system that separates skill and consistency, validated empirically.

Contribution

It reveals Elo's inability to accurately identify transitive components in games and introduces a new dual-score system for better player strength measurement.

Findings

01

Elo models can fail even in simple transitive games.

02

The proposed system assigns separate skill and consistency scores.

03

Empirical validation shows improved player strength assessment.

Abstract

Real-world competitive games, such as chess, go, or StarCraft II, rely on Elo models to measure the strength of their players. Since these games are not fully transitive, using Elo implicitly assumes they have a strong transitive component that can correctly be identified and extracted. In this study, we investigate the challenge of identifying the strength of the transitive component in games. First, we show that Elo models can fail to extract this transitive component, even in elementary transitive games. Then, based on this observation, we propose an extension of the Elo score: we end up with a disc ranking system that assigns each player two scores, which we refer to as skill and consistency. Finally, we propose an empirical validation on payoff matrices coming from real-world games played by bots and humans.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

qb3/discrating
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Games · Sports Analytics and Performance · Gambling Behavior and Treatments