On the Limitations of Elo: Real-World Games, are Transitive, not Additive
Quentin Bertrand, Wojciech Marian Czarnecki, Gauthier Gidel

TL;DR
This paper examines the limitations of Elo ratings in real-world games, demonstrating their failure to capture transitive strength and proposing an extended scoring system that separates skill and consistency, validated empirically.
Contribution
It reveals Elo's inability to accurately identify transitive components in games and introduces a new dual-score system for better player strength measurement.
Findings
Elo models can fail even in simple transitive games.
The proposed system assigns separate skill and consistency scores.
Empirical validation shows improved player strength assessment.
Abstract
Real-world competitive games, such as chess, go, or StarCraft II, rely on Elo models to measure the strength of their players. Since these games are not fully transitive, using Elo implicitly assumes they have a strong transitive component that can correctly be identified and extracted. In this study, we investigate the challenge of identifying the strength of the transitive component in games. First, we show that Elo models can fail to extract this transitive component, even in elementary transitive games. Then, based on this observation, we propose an extension of the Elo score: we end up with a disc ranking system that assigns each player two scores, which we refer to as skill and consistency. Finally, we propose an empirical validation on payoff matrices coming from real-world games played by bots and humans.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games · Sports Analytics and Performance · Gambling Behavior and Treatments
