Application of the Pythagorean Expected Wins Percentage and Cross-Validation Methods in Estimating Team Quality
Christopher Boudreaux, Justin Ehrlich, Shankar Ghimire, and Shane, Sanders

TL;DR
This paper evaluates the Pythagorean Expected Wins model and introduces contest theory-based alternatives, demonstrating that the serial CSF significantly improves team quality estimation accuracy using MLB data.
Contribution
It transforms and compares contest theory models for estimating team quality, showing the serial CSF outperforms existing models in predictive accuracy.
Findings
Serial CSF reduces root mean squared error in wins estimation.
Serial CSF significantly improves team quality estimates.
Model comparison validates contest theory alternatives for sports analytics.
Abstract
The Pythagorean Expected Wins Percentage Model was developed by Bill James to estimate a baseball team expected wins percentage over the course of a season. As such, the model can be used to assess how lucky or unfortunate a team was over the course of a season. From a sports analytics perspective, such information is valuable in that it is important to understand how reproducible a given result may be in the next time period. In contest theoretic (game theoretic) parlance, the original model represents a (restricted) Tullock contest success function (CSF). We transform, estimate, and compare the original model and two alternative models from contest theory, the serial and difference form CSFs, using MLB team win data (2003 to 2015) and perform a cross-validation exercise to test the accuracy of the alternative models. The serial CSF estimator dramatically improves wins estimation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSports Analytics and Performance
