Trouble With The Curve: Improving MLB Pitch Classification
Michael A. Pane, Samuel L. Ventura, Rebecca C. Steorts, A.C. Thomas

TL;DR
This paper introduces a new model-based clustering method using Gaussian mixtures and an adjustment factor to improve MLB pitch classification accuracy and interpretability, addressing limitations of previous manual and neural network approaches.
Contribution
The paper presents a novel clustering algorithm and classification method for pitches, enhancing accuracy and interpretability over existing techniques.
Findings
Effective classification of diverse pitch types
Improved accuracy over traditional methods
Applicable to various pitchers and pitch styles
Abstract
The PITCHf/x database has allowed the statistical analysis of of Major League Baseball (MLB) to flourish since its introduction in late 2006. Using PITCHf/x, pitches have been classified by hand, requiring considerable effort, or using neural network clustering and classification, which is often difficult to interpret. To address these issues, we use model-based clustering with a multivariate Gaussian mixture model and an appropriate adjustment factor as an alternative to current methods. Furthermore, we describe a new pitch classification algorithm based on our clustering approach to address the problems of pitch misclassification. We illustrate our methods for various pitchers from the PITCHf/x database that covers a wide variety of pitch types.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSports Analytics and Performance · Sports Dynamics and Biomechanics · Multidisciplinary Science and Engineering Research
