Leveraging 2D Information for Long-term Time Series Forecasting with Vanilla Transformers
Xin Cheng, Xiuying Chen, Shuqi Li, Di Luo, Xun Wang, Dongyan Zhao, Rui, Yan

TL;DR
GridTST introduces a novel multi-directional attention mechanism in vanilla Transformers, modeling both temporal and variate dependencies in grid-structured time series data, achieving state-of-the-art forecasting accuracy.
Contribution
The paper proposes GridTST, a new Transformer-based model that combines vertical and horizontal attention to better capture multivariate and temporal dependencies in time series.
Findings
Achieves state-of-the-art results on multiple datasets.
Effectively models both temporal and variate correlations.
Improves forecasting accuracy over existing methods.
Abstract
Time series prediction is crucial for understanding and forecasting complex dynamics in various domains, ranging from finance and economics to climate and healthcare. Based on Transformer architecture, one approach involves encoding multiple variables from the same timestamp into a single temporal token to model global dependencies. In contrast, another approach embeds the time points of individual series into separate variate tokens. The former method faces challenges in learning variate-centric representations, while the latter risks missing essential temporal information critical for accurate forecasting. In our work, we introduce GridTST, a model that combines the benefits of two approaches using innovative multi-directional attentions based on a vanilla Transformer. We regard the input time series data as a grid, where the -axis represents the time steps and the -axis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Stock Market Forecasting Methods
MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Multi-Head Attention · Residual Connection · Byte Pair Encoding · Label Smoothing · Adam · Absolute Position Encodings · Dropout
