Are Self-Attentions Effective for Time Series Forecasting?

Dongbin Kim; Jinseong Park; Jaewook Lee; Hoki Kim

arXiv:2405.16877·cs.LG·December 24, 2024·3 cites

Are Self-Attentions Effective for Time Series Forecasting?

Dongbin Kim, Jinseong Park, Jaewook Lee, Hoki Kim

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces CATS, a novel transformer architecture that replaces self-attention with cross-attention for time series forecasting, achieving better accuracy with fewer parameters.

Contribution

The paper proposes a new cross-attention-only transformer model for time series forecasting, challenging the effectiveness of self-attention in this domain.

Findings

01

CATS outperforms existing models in forecasting accuracy.

02

CATS uses fewer parameters and less memory.

03

CATS achieves the lowest mean squared error across datasets.

Abstract

Time series forecasting is crucial for applications across multiple domains and various scenarios. Although Transformer models have dramatically advanced the landscape of forecasting, their effectiveness remains debated. Recent findings have indicated that simpler linear models might outperform complex Transformer-based approaches, highlighting the potential for more streamlined architectures. In this paper, we shift the focus from evaluating the overall Transformer architecture to specifically examining the effectiveness of self-attention for time series forecasting. To this end, we introduce a new architecture, Cross-Attention-only Time Series transformer (CATS), that rethinks the traditional Transformer framework by eliminating self-attention and leveraging cross-attention mechanisms instead. By establishing future horizon-dependent parameters as queries and enhanced parameter…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dongbeank/cats
pytorchOfficial

Videos

Are Self-Attentions Effective for Time Series Forecasting?· slideslive

Taxonomy

TopicsForecasting Techniques and Applications

MethodsLinear Layer · Byte Pair Encoding · Label Smoothing · Adam · Attention Is All You Need · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention · Dropout · Dense Connections