Application of the Pythagorean Expected Wins Percentage and   Cross-Validation Methods in Estimating Team Quality

Christopher Boudreaux; Justin Ehrlich; Shankar Ghimire; and Shane; Sanders

arXiv:2201.01168·stat.AP·January 5, 2022

Application of the Pythagorean Expected Wins Percentage and Cross-Validation Methods in Estimating Team Quality

Christopher Boudreaux, Justin Ehrlich, Shankar Ghimire, and Shane, Sanders

PDF

Open Access

TL;DR

This paper evaluates the Pythagorean Expected Wins model and introduces contest theory-based alternatives, demonstrating that the serial CSF significantly improves team quality estimation accuracy using MLB data.

Contribution

It transforms and compares contest theory models for estimating team quality, showing the serial CSF outperforms existing models in predictive accuracy.

Findings

01

Serial CSF reduces root mean squared error in wins estimation.

02

Serial CSF significantly improves team quality estimates.

03

Model comparison validates contest theory alternatives for sports analytics.

Abstract

The Pythagorean Expected Wins Percentage Model was developed by Bill James to estimate a baseball team expected wins percentage over the course of a season. As such, the model can be used to assess how lucky or unfortunate a team was over the course of a season. From a sports analytics perspective, such information is valuable in that it is important to understand how reproducible a given result may be in the next time period. In contest theoretic (game theoretic) parlance, the original model represents a (restricted) Tullock contest success function (CSF). We transform, estimate, and compare the original model and two alternative models from contest theory, the serial and difference form CSFs, using MLB team win data (2003 to 2015) and perform a cross-validation exercise to test the accuracy of the alternative models. The serial CSF estimator dramatically improves wins estimation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSports Analytics and Performance