The Evaluation of Rating Systems in Online Free-for-All Games
Arman Dehpanah, Muheeb Faizan Ghori, Jonathan Gemmell, Bamshad, Mobasher

TL;DR
This paper conducts a comprehensive evaluation of six metrics for assessing rating systems in online free-for-all games, highlighting the strengths and weaknesses of each and recommending NDCG as the most effective measure.
Contribution
It introduces an extensive comparison of evaluation metrics for rating systems in online games and advocates for NDCG as the most suitable metric.
Findings
Some metrics ignore rank deviations.
Many metrics are affected by new players.
NDCG effectively addresses previous limitations.
Abstract
Online competitive games have become increasingly popular. To ensure an exciting and competitive environment, these games routinely attempt to match players with similar skill levels. Matching players is often accomplished through a rating system. There has been an increasing amount of research on developing such rating systems. However, less attention has been given to the evaluation metrics of these systems. In this paper, we present an exhaustive analysis of six metrics for evaluating rating systems in online competitive games. We compare traditional metrics such as accuracy. We then introduce other metrics adapted from the field of information retrieval. We evaluate these metrics against several well-known rating systems on a large real-world dataset of over 100,000 free-for-all matches. Our results show stark differences in their utility. Some metrics do not consider deviations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSports Analytics and Performance · Gambling Behavior and Treatments · Digital Games and Media
