Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting
Yong Liu, Haixu Wu, Jianmin Wang, Mingsheng Long

TL;DR
This paper introduces Non-stationary Transformers, a framework that balances series stationarization and de-stationary attention to improve forecasting accuracy on non-stationary time series data, outperforming existing models.
Contribution
It proposes a novel framework with Series Stationarization and De-stationary Attention modules to address over-stationarization and enhance deep model capability.
Findings
Reduces MSE by over 46% on multiple Transformer models.
Consistently outperforms baseline Transformers in forecasting accuracy.
Establishes state-of-the-art results on benchmark datasets.
Abstract
Transformers have shown great power in time series forecasting due to their global-range modeling ability. However, their performance can degenerate terribly on non-stationary real-world data in which the joint distribution changes over time. Previous studies primarily adopt stationarization to attenuate the non-stationarity of original series for better predictability. But the stationarized series deprived of inherent non-stationarity can be less instructive for real-world bursty events forecasting. This problem, termed over-stationarization in this paper, leads Transformers to generate indistinguishable temporal attentions for different series and impedes the predictive capability of deep models. To tackle the dilemma between series predictability and model capability, we propose Non-stationary Transformers as a generic framework with two interdependent modules: Series…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTime Series Analysis and Forecasting · Stock Market Forecasting Methods · Forecasting Techniques and Applications
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · 1x1 Convolution · Convolution · Absolute Position Encodings · Reversible Residual Block · Position-Wise Feed-Forward Layer · Byte Pair Encoding
