Easy attention: A simple attention mechanism for temporal predictions with transformers

Marcial Sanchis-Agudo; Yuning Wang; Roger Arnau; Luca Guastoni; Jasmin Lim; Karthik Duraisamy; Ricardo Vinuesa

arXiv:2308.12874·cs.LG·June 5, 2025·1 cites

Easy attention: A simple attention mechanism for temporal predictions with transformers

Marcial Sanchis-Agudo, Yuning Wang, Roger Arnau, Luca Guastoni, Jasmin Lim, Karthik Duraisamy, Ricardo Vinuesa

PDF

Open Access

TL;DR

This paper introduces easy attention, a simplified attention mechanism for transformers that directly learns attention scores, improving robustness and performance in temporal predictions of chaotic systems compared to traditional self-attention and LSTM.

Contribution

The paper proposes a novel easy attention mechanism that eliminates the need for queries, keys, and softmax, directly learning attention scores for better temporal predictions.

Findings

01

Easy attention outperforms self attention and LSTM in chaotic system predictions.

02

The method demonstrates robustness across Lorenz system, turbulence shear flow, and nuclear reactor models.

03

Simplifies attention mechanism while maintaining or improving accuracy.

Abstract

To improve the robustness of transformer neural networks used for temporal-dynamics prediction of chaotic systems, we propose a novel attention mechanism called easy attention which we demonstrate in time-series reconstruction and prediction. While the standard self attention only makes use of the inner product of queries and keys, it is demonstrated that the keys, queries and softmax are not necessary for obtaining the attention score required to capture long-term dependencies in temporal sequences. Through the singular-value decomposition (SVD) on the softmax attention score, we further observe that self attention compresses the contributions from both queries and keys in the space spanned by the attention score. Therefore, our proposed easy-attention method directly treats the attention scores as learnable parameters. This approach produces excellent results when reconstructing and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational Physics and Python Applications · Neural Networks and Applications · Time Series Analysis and Forecasting

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Adam · Residual Connection · Layer Normalization · Label Smoothing · Byte Pair Encoding · Dropout · Absolute Position Encodings