What Matters in Deep Learning for Time Series Forecasting?
Valentina Moretti, Andrea Cini, Ivan Marisca, Cesare Alippi

TL;DR
This paper analyzes the design principles of deep learning models for time series forecasting, emphasizing the importance of foundational aspects like locality and globality over specific layers, and calls for improved benchmarking practices.
Contribution
It provides a framework for understanding design trade-offs in forecasting architectures and introduces a model card for better characterization of models based on key design choices.
Findings
Locality and globality are crucial for accurate forecasting.
Simple architectures can match state-of-the-art performance.
Implementation details significantly impact empirical results.
Abstract
Deep learning models have grown increasingly popular in time series applications. However, the large quantity of newly proposed architectures, together with often contradictory empirical results, makes it difficult to assess which components contribute significantly to final performance. We aim to make sense of the current design space of deep learning architectures for time series forecasting by discussing the design dimensions and trade-offs that can explain, often unexpected, observed results. This paper discusses the necessity of grounding model design on principles for forecasting groups of time series and how such principles can be applied to current models. In particular, we assess how concepts such as locality and globality apply to recent forecasting architectures. We show that accounting for these aspects can be more relevant for achieving accurate results than adopting…
Peer Reviews
Decision·Submitted to ICLR 2026
This paper offers a meta-analytical and diagnostic contribution rather than a new predictive model. Its novelty lies in articulating a unified conceptual framework for analyzing deep time series forecasting architectures and demonstrating that benchmarking inconsistencies, not model innovation, explain many reported performance gains. The introduction of a forecasting model card is a valuable proposal for standardizing model documentation, enhancing reproducibility and interpretability across st
- While comprehensive, the study focuses solely on deterministic point forecasting. This leaves out probabilistic and uncertainty-aware approaches, which are central to modern time series applications. The authors acknowledge this but could have elaborated on how their findings generalize to probabilistic settings. - Although the paper references major forecasting works, it under-engages with recent multimodal and foundation time series models (e.g., TFT, Chronos, pretrained time-series transfo
- Timely and important: Addresses fundamental benchmarking issues affecting the entire time series forecasting community. - Rigorous empirical work: Comprehensive ablation studies with controlled comparisons across multiple design dimensions. - Actionable template: The forecasting model card could standardize future research and improve reproducibility.
- Given the paper's broad claims about deep learning for time series forecasting, the experimental scope (4 datasets, long-range forecasting only, no probabilistic forecasting) seems insufficient to support such general conclusions. - While experienced practitioners may anticipate some findings (e.g., that preprocessing matters), the systematic quantification of these effects is valuable. However, the paper lacks surprising insights that would significantly advance our understanding. - The paper
1. The paper calls for rethinking the benchmarks of time series forecasting domain, which I recognize is indeed very necessary and very important. 2. The authors calls for better understanding of architecture's designing space, which might be a method to solve the phenomena that time series forecasting community has been making little progress in the past years.
1. How are you sure that it's the `model card` rather than the `dataset and benchmarks` that have gone wrong? **Imagine that the CV community are using MNIST rather than CIFAR, ImageNet or other datasets, perhaps researchers could also publish hundreds of papers per year proposing all kinds of CNN/Transformer designs persuing $0.1\%$ improvement on MNIST**. "Oh, my method classifies MNIST better than existing sota". **In that case, you could also do experiments and find "hey, perhaps using vit i
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsForecasting Techniques and Applications · Stock Market Forecasting Methods · Time Series Analysis and Forecasting
