Three-Stage Learning Unlocks Strong Performance in Simple Models for Long-Term Time Series Forecasting
Zhenan Yu, Guangxin Jiang, Jin Yang

TL;DR
This paper introduces STAIR, a three-stage training paradigm that enhances simple models for long-term time series forecasting by progressively capturing shared and variable-specific dynamics without complex architectures.
Contribution
The paper proposes a novel three-stage training framework, STAIR, that improves simple temporal models for long-term forecasting by structured training and residual learning, avoiding complex modules.
Findings
STAIR matches or outperforms recent strong baselines on nine benchmarks.
The approach effectively captures shared and variable-specific dynamics.
Maintains simplicity of the core temporal predictor.
Abstract
Recent studies on long-term time series forecasting have shown that simple linear models and MLP-based predictors can achieve strong performance without increasingly complex architectures. However, many competitive baselines still rely on structural priors such as frequency-domain modeling, explicit decomposition, multi-scale mixing, or sophisticated cross-variable interaction modules, while paying less attention to how simple temporal mappings should be trained and organized. In this paper, we propose STAIR, short for Stagewise Temporal Adaptation via Individualization and Residual Learning, a training paradigm for long-term time series forecasting that aims to unlock the capacity of simple temporal mapping models without introducing complex architectural modules. STAIR decomposes forecasting ability into three progressive stages: it first learns common temporal dynamics across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
