The Power of Architecture: Deep Dive into Transformer Architectures for Long-Term Time Series Forecasting
Lefei Shen, Mouxiang Chen, Han Fu, Xiaoxue Ren, Xiaoyun Joy Wang, Jianling Sun, Zhuo Li, Chenghao Liu

TL;DR
This paper systematically compares Transformer architectures for long-term time series forecasting, revealing that certain design choices like bi-directional attention and direct-mapping significantly enhance performance.
Contribution
It introduces a novel taxonomy to disentangle Transformer design variations, enabling clearer comparisons and identifying optimal architectural configurations for LTSF.
Findings
Bi-directional attention with joint-attention is most effective.
Complete forecasting aggregation improves performance.
Direct-mapping paradigm outperforms autoregressive approaches.
Abstract
Transformer-based models have recently become dominant in Long-term Time Series Forecasting (LTSF), yet the variations in their architecture, such as encoder-only, encoder-decoder, and decoder-only designs, raise a crucial question: What Transformer architecture works best for LTSF tasks? However, existing models are often tightly coupled with various time-series-specific designs, making it difficult to isolate the impact of the architecture itself. To address this, we propose a novel taxonomy that disentangles these designs, enabling clearer and more unified comparisons of Transformer architectures. Our taxonomy considers key aspects such as attention mechanisms, forecasting aggregations, forecasting paradigms, and normalization layers. Through extensive experiments, we uncover several key insights: bi-directional attention with joint-attention is most effective; more complete…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Complex Systems and Time Series Analysis
