Simulating MLB Seasons using Bayesian Inference and Random Walks
Simon Cha

TL;DR
This paper presents a Bayesian simulation framework for predicting MLB season outcomes by modeling team performance and game results using probabilistic methods, random walks, and Kalman filters.
Contribution
It introduces a novel combination of Bayesian inference, random walks, and Kalman filters for simulating and forecasting MLB season results from limited initial data.
Findings
Produced distributions of team wins across simulated seasons
Estimated playoff probabilities for all teams
Demonstrated effective season outcome forecasting
Abstract
As a dedicated follower of sports statistics and with the MLB season beginning in late March, I set out to predict how many wins each team would accumulate by the end of the 162 game season. The goal was to build a simulation framework capable of forecasting the remainder of the season, starting from a 20 game burn-in period to establish initial estimates of team strength. My approach used a Bayesian inference model incorporating team win percentage, batting average, and pitching ERA to construct a posterior distribution of win probability for each matchup. For each game, I sampled from the posterior and simulated the outcome using a Bernoulli trial. Because future matchup inputs were unobserved, I forecasted batting averages using random walks and modeled pitching ERA with Kalman filters. After simulating many seasons, the model produced a distribution of win totals for all 30 teams…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSports Analytics and Performance · Probability and Statistical Research · Data Analysis with R
