Rethinking Adam for Time Series Forecasting: A Simple Heuristic to Improve Optimization under Distribution Shifts
Yuze Dong, Jinsong Wu

TL;DR
This paper introduces TS_Adam, a simple modification to Adam optimizer that enhances its responsiveness to distribution shifts in time series forecasting, leading to improved accuracy without added complexity.
Contribution
The paper proposes TS_Adam, a lightweight variant of Adam that removes second-order bias correction, improving adaptability to non-stationary data in forecasting tasks.
Findings
TS_Adam reduces MSE by 12.8% on ETT datasets.
TS_Adam improves MAE by 5.7% over Adam.
The method requires no additional hyperparameters.
Abstract
Time-series forecasting often faces challenges from non-stationarity, particularly distributional drift, where the data distribution evolves over time. This dynamic behavior can undermine the effectiveness of adaptive optimizers, such as Adam, which are typically designed for stationary objectives. In this paper, we revisit Adam in the context of non-stationary forecasting and identify that its second-order bias correction limits responsiveness to shifting loss landscapes. To address this, we propose TS_Adam, a lightweight variant that removes the second-order correction from the learning rate computation. This simple modification improves adaptability to distributional drift while preserving the optimizer core structure and requiring no additional hyperparameters. TS_Adam integrates easily into existing models and consistently improves performance across long- and short-term…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Traffic Prediction and Management Techniques · Forecasting Techniques and Applications
