Rethinking Evaluation Metric for Probability Estimation Models Using Esports Data
Euihyeon Choi, Jooyoung Kim, Wonkyung Lee

TL;DR
This paper introduces a new evaluation metric called Balance score for probability estimation models, especially in esports, addressing limitations of existing metrics like accuracy, Brier score, and ECE, and demonstrating its effectiveness through simulations and real data.
Contribution
It proposes the Balance score as a novel, effective metric for evaluating probability estimation models, improving upon traditional metrics in esports and general applications.
Findings
Balance score effectively approximates true calibration error.
Extensive evaluations show the metric's robustness and applicability.
Proposed metric outperforms traditional metrics in reliability assessment.
Abstract
Probability estimation models play an important role in various fields, such as weather forecasting, recommendation systems, and sports analysis. Among several models estimating probabilities, it is difficult to evaluate which model gives reliable probabilities since the ground-truth probabilities are not available. The win probability estimation model for esports, which calculates the win probability under a certain game state, is also one of the fields being actively studied in probability estimation. However, most of the previous works evaluated their models using accuracy, a metric that only can measure the performance of discrimination. In this work, we firstly investigate the Brier score and the Expected Calibration Error (ECE) as a replacement of accuracy used as a performance evaluation metric for win probability estimation models in esports field. Based on the analysis, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games · Sports Analytics and Performance · Digital Games and Media
