Leveraging 2D Information for Long-term Time Series Forecasting with   Vanilla Transformers

Xin Cheng; Xiuying Chen; Shuqi Li; Di Luo; Xun Wang; Dongyan Zhao; Rui; Yan

arXiv:2405.13810·cs.LG·May 24, 2024·3 cites

Leveraging 2D Information for Long-term Time Series Forecasting with Vanilla Transformers

Xin Cheng, Xiuying Chen, Shuqi Li, Di Luo, Xun Wang, Dongyan Zhao, Rui, Yan

PDF

Open Access 1 Repo

TL;DR

GridTST introduces a novel multi-directional attention mechanism in vanilla Transformers, modeling both temporal and variate dependencies in grid-structured time series data, achieving state-of-the-art forecasting accuracy.

Contribution

The paper proposes GridTST, a new Transformer-based model that combines vertical and horizontal attention to better capture multivariate and temporal dependencies in time series.

Findings

01

Achieves state-of-the-art results on multiple datasets.

02

Effectively models both temporal and variate correlations.

03

Improves forecasting accuracy over existing methods.

Abstract

Time series prediction is crucial for understanding and forecasting complex dynamics in various domains, ranging from finance and economics to climate and healthcare. Based on Transformer architecture, one approach involves encoding multiple variables from the same timestamp into a single temporal token to model global dependencies. In contrast, another approach embeds the time points of individual series into separate variate tokens. The former method faces challenges in learning variate-centric representations, while the latter risks missing essential temporal information critical for accurate forecasting. In our work, we introduce GridTST, a model that combines the benefits of two approaches using innovative multi-directional attentions based on a vanilla Transformer. We regard the input time series data as a grid, where the $x$ -axis represents the time steps and the $y$ -axis…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Hannibal046/GridTST
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTime Series Analysis and Forecasting · Stock Market Forecasting Methods

MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Multi-Head Attention · Residual Connection · Byte Pair Encoding · Label Smoothing · Adam · Absolute Position Encodings · Dropout