Machine Learning for Soccer Match Result Prediction
Rory Bunker, Calvin Yeung, Keisuke Fujii

TL;DR
This paper reviews machine learning methods for soccer match result prediction, highlighting current best models, datasets, and future research directions including model interpretability and richer data features.
Contribution
It provides a comprehensive overview of existing models, datasets, and evaluation methods, and identifies gaps such as the need for deeper comparisons and more interpretable models.
Findings
Gradient-boosted trees like CatBoost perform best on goal-only datasets.
Deep learning and Random Forest models require further comparison.
Incorporating spatiotemporal and event data could improve predictions.
Abstract
Machine learning has become a common approach to predicting the outcomes of soccer matches, and the body of literature in this domain has grown substantially in the past decade and a half. This chapter discusses available datasets, the types of models and features, and ways of evaluating model performance in this application domain. The aim of this chapter is to give a broad overview of the current state and potential future developments in machine learning for soccer match results prediction, as a resource for those interested in conducting future studies in the area. Our main findings are that while gradient-boosted tree models such as CatBoost, applied to soccer-specific ratings such as pi-ratings, are currently the best-performing models on datasets containing only goals as the match features, there needs to be a more thorough comparison of the performance of deep learning models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSports Analytics and Performance
