Kaggle forecasting competitions: An overlooked learning opportunity
Casper Solheim Bojer, Jens Peder Meldgaard

TL;DR
This paper reviews six Kaggle forecasting competitions, highlighting their unique data characteristics and the effectiveness of ensemble models, gradient boosting, and neural networks in real-world forecasting tasks.
Contribution
It provides a comprehensive analysis of Kaggle forecasting competitions, emphasizing their value for advancing forecasting methods and understanding real-world data complexities.
Findings
Kaggle datasets often have higher intermittence and entropy than M-competitions.
Global ensemble models outperform local single models.
Gradient boosted decision trees and neural networks show strong forecasting performance.
Abstract
Competitions play an invaluable role in the field of forecasting, as exemplified through the recent M4 competition. The competition received attention from both academics and practitioners and sparked discussions around the representativeness of the data for business forecasting. Several competitions featuring real-life business forecasting tasks on the Kaggle platform has, however, been largely ignored by the academic community. We believe the learnings from these competitions have much to offer to the forecasting community and provide a review of the results from six Kaggle competitions. We find that most of the Kaggle datasets are characterized by higher intermittence and entropy than the M-competitions and that global ensemble models tend to outperform local single models. Furthermore, we find the strong performance of gradient boosted decision trees, increasing success of neural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
