Forecasting with Deep Learning: Beyond Average of Average of Average Performance
Vitor Cerqueira, Luis Roque, Carlos Soares

TL;DR
This paper introduces a multi-perspective evaluation framework for forecasting models, revealing that deep learning methods like NHITS outperform classical techniques mainly in multi-step ahead forecasting, with performance varying across conditions.
Contribution
It proposes a novel evaluation framework that considers different forecasting aspects, demonstrating the importance of nuanced assessment over single-score metrics.
Findings
NHITS generally outperforms classical methods
Performance advantage of NHITS depends on forecasting horizon
NHITS is less effective when dealing with anomalies
Abstract
Accurate evaluation of forecasting models is essential for ensuring reliable predictions. Current practices for evaluating and comparing forecasting models focus on summarising performance into a single score, using metrics such as SMAPE. We hypothesize that averaging performance over all samples dilutes relevant information about the relative performance of models. Particularly, conditions in which this relative performance is different than the overall accuracy. We address this limitation by proposing a novel framework for evaluating univariate time series forecasting models from multiple perspectives, such as one-step ahead forecasting versus multi-step ahead forecasting. We show the advantages of this framework by comparing a state-of-the-art deep learning approach with classical forecasting techniques. While classical methods (e.g. ARIMA) are long-standing approaches to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsForecasting Techniques and Applications
MethodsFocus
