Attention as Robust Representation for Time Series Forecasting

PeiSong Niu; Tian Zhou; Xue Wang; Liang Sun; Rong Jin

arXiv:2402.05370·cs.LG·February 9, 2024·2 cites

Attention as Robust Representation for Time Series Forecasting

PeiSong Niu, Tian Zhou, Xue Wang, Liang Sun, Rong Jin

PDF

Open Access

TL;DR

This paper proposes elevating attention weights as the main data representation in time series forecasting, demonstrating improved robustness and accuracy over existing models by leveraging attention maps structured with global and local features.

Contribution

It introduces a novel approach that uses attention weights as the primary representation, enhancing robustness against noise and distribution shifts in time series forecasting.

Findings

01

Outperforms state-of-the-art models with 3.6% lower MSE

02

Attention maps serve as robust kernels for noisy data

03

Method is compatible with existing transformer architectures

Abstract

Time series forecasting is essential for many practical applications, with the adoption of transformer-based models on the rise due to their impressive performance in NLP and CV. Transformers' key feature, the attention mechanism, dynamically fusing embeddings to enhance data representation, often relegating attention weights to a byproduct role. Yet, time series data, characterized by noise and non-stationarity, poses significant forecasting challenges. Our approach elevates attention weights as the primary representation for time series, capitalizing on the temporal relationships among data points to improve forecasting accuracy. Our study shows that an attention map, structured using global landmarks and local windows, acts as a robust kernel representation for data points, withstanding noise and shifts in distribution. Our method outperforms state-of-the-art models, reducing mean…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsActivation Patching