Attention as Robust Representation for Time Series Forecasting
PeiSong Niu, Tian Zhou, Xue Wang, Liang Sun, Rong Jin

TL;DR
This paper proposes elevating attention weights as the main data representation in time series forecasting, demonstrating improved robustness and accuracy over existing models by leveraging attention maps structured with global and local features.
Contribution
It introduces a novel approach that uses attention weights as the primary representation, enhancing robustness against noise and distribution shifts in time series forecasting.
Findings
Outperforms state-of-the-art models with 3.6% lower MSE
Attention maps serve as robust kernels for noisy data
Method is compatible with existing transformer architectures
Abstract
Time series forecasting is essential for many practical applications, with the adoption of transformer-based models on the rise due to their impressive performance in NLP and CV. Transformers' key feature, the attention mechanism, dynamically fusing embeddings to enhance data representation, often relegating attention weights to a byproduct role. Yet, time series data, characterized by noise and non-stationarity, poses significant forecasting challenges. Our approach elevates attention weights as the primary representation for time series, capitalizing on the temporal relationships among data points to improve forecasting accuracy. Our study shows that an attention map, structured using global landmarks and local windows, acts as a robust kernel representation for data points, withstanding noise and shifts in distribution. Our method outperforms state-of-the-art models, reducing mean…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsActivation Patching
