Exploring the Difficulty of Estimating Win Probability: A Simulation Study
Ryan S. Brill, Ronald Yurko, and Abraham J. Wyner

TL;DR
This study uses simulation to demonstrate the significant challenges and uncertainties in accurately estimating win probabilities in sports analytics due to data noise, correlation, and bias.
Contribution
It provides a simulation-based analysis showing the difficulty of modeling true win probabilities from noisy, correlated observational data, highlighting bias and variance issues.
Findings
Dependence structure inflates bias and variance
Effective sample size is reduced in observational data
Confidence intervals need to be wide for valid coverage
Abstract
Estimating win probability is one of the classic modeling tasks of sports analytics. Many widely used win probability estimators use machine learning to fit the relationship between a binary win/loss outcome variable and certain game-state variables. To illustrate just how difficult it is to accurately fit such a model from noisy and highly correlated observational data, in this paper we conduct a simulation study. We create a simplified random walk version of football in which true win probability at each game-state is known, and we see how well a model recovers it. We find that the dependence structure of observational play-by-play data substantially inflates the bias and variance of estimators and lowers the effective sample size. Further, to achieve approximately valid marginal coverage, win probability confidence intervals need to be substantially wide. Concisely, these are high…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsConsumer Market Behavior and Pricing · Business Strategies and Innovation · Agricultural Innovations and Practices
