Are Transformers Effective for Time Series Forecasting?
Ailing Zeng, Muxi Chen, Lei Zhang, Qiang Xu

TL;DR
This paper questions the effectiveness of Transformer models for long-term time series forecasting, showing that simple linear models can outperform complex Transformers on multiple datasets due to their better preservation of temporal information.
Contribution
The study introduces a simple linear model, LTSF-Linear, that outperforms Transformer-based models in time series forecasting, challenging the current research focus.
Findings
LTSF-Linear outperforms Transformer models on nine datasets.
Transformers' permutation-invariant attention leads to temporal information loss.
Simple linear models can effectively capture temporal relations in time series.
Abstract
Recently, there has been a surge of Transformer-based solutions for the long-term time series forecasting (LTSF) task. Despite the growing performance over the past few years, we question the validity of this line of research in this work. Specifically, Transformers is arguably the most successful solution to extract the semantic correlations among the elements in a long sequence. However, in time series modeling, we are to extract the temporal relations in an ordered set of continuous points. While employing positional encoding and using tokens to embed sub-series in Transformers facilitate preserving some ordering information, the nature of the \emph{permutation-invariant} self-attention mechanism inevitably results in temporal information loss. To validate our claim, we introduce a set of embarrassingly simple one-layer linear models named LTSF-Linear for comparison. Experimental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTime Series Analysis and Forecasting · Anomaly Detection Techniques and Applications · Advanced Text Analysis Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Layer Normalization · Byte Pair Encoding · Dense Connections · Absolute Position Encodings · Dropout · Adam
