Reorganizing attention-space geometry with expressive attention
Claudius Gros

TL;DR
This paper introduces expressive attention, which squares the dot product in attention mechanisms, reorganizing attention space geometry, and demonstrates it can outperform standard dot-product attention in complex tasks without additional computational costs.
Contribution
The paper proposes a novel expressive attention mechanism based on squared dot products, enhancing attention geometry and improving performance on complex tasks without extra computation.
Findings
Expressive attention performs at least as well as standard dot-product attention.
EA outperforms DPA as task complexity increases.
EA achieves 100% performance on certain complex tasks.
Abstract
Attention regulates information transfer between tokens. For this, query and key vectors are compared, typically in terms of a scalar product, , together with a subsequent softmax normalization. In geometric terms, the standard dot-product attention (DPA) leads to large/small attention weights for parallel/antiparallel queries and keys. Here we study expressive attention (EA), which is based on , the squared dot product. In this case, attention is enhanced when query and key are either parallel or antiparallel, and suppressed for orthogonal configurations. EA can be introduced into any attention-based code without additional compute costs or memory requirements. For a series of autoregressive prediction tasks, we find that expressive attention performs at least as well as vanilla DPA. Increasing task complexity, EA is observed to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health Research Topics · Neuroscience, Education and Cognitive Function · Online Learning and Analytics
MethodsAttention Is All You Need · Softmax
