Filling the Gaps: A Multiple Imputation Approach to Estimating Aging Curves in Baseball
Quang Nguyen, Gregory J. Matthews

TL;DR
This paper introduces a multiple imputation method to accurately estimate aging curves in baseball by addressing missing data and dropout biases, improving the understanding of player performance over careers.
Contribution
It develops a novel multiple imputation framework for estimating aging curves in sports, accounting for dropout mechanisms and missing data in player career analysis.
Findings
Multiple imputation reduces bias in aging curve estimates.
Dropout mechanisms significantly affect performance trajectory estimates.
Method provides more accurate aging curves compared to traditional approaches.
Abstract
In sports, an aging curve depicts the relationship between average performance and age in athletes' careers. This paper investigates the aging curves for offensive players in Major League Baseball. We study this problem in a missing data context and account for different types of dropouts of baseball players during their careers. We employ a multiple imputation framework for multilevel data to impute the player performance associated with the missing seasons, and estimate the aging curves based on the imputed datasets. We then evaluate the effects of different dropout mechanisms on the aging curves through simulation, before applying our method to analyze MLB player data from past seasons. Results suggest an overestimation of the aging curves constructed without considering the unobserved seasons, whereas estimates obtained from multiple imputation address this shortcoming.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSports Analytics and Performance
