Elliptical Attention
Stefan K. Nielsen, Laziz U. Abdullaev, Rachel S.Y. Teo, Tan M. Nguyen

TL;DR
Elliptical Attention introduces a Mahalanobis distance-based attention mechanism that improves robustness and reduces representation collapse in transformers across language and vision tasks.
Contribution
The paper proposes Elliptical Attention, a novel attention method using Mahalanobis distance to focus on contextually relevant features, enhancing robustness and reducing collapse.
Findings
Outperforms baseline dot-product attention in various tasks
Reduces representation collapse in transformer models
Enhances robustness to contaminated samples
Abstract
Pairwise dot-product self-attention is key to the success of transformers that achieve state-of-the-art performance across a variety of applications in language and vision. This dot-product self-attention computes attention weights among the input tokens using Euclidean distance, which makes the model prone to representation collapse and vulnerable to contaminated samples. In this paper, we propose using a Mahalanobis distance metric for computing the attention weights to stretch the underlying feature space in directions of high contextual relevance. In particular, we define a hyper-ellipsoidal neighborhood around each query to increase the attention weights of the tokens lying in the contextually important directions. We term this novel class of attention Elliptical Attention. Our Elliptical Attention provides two benefits: 1) reducing representation collapse and 2) enhancing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSpatial Cognition and Navigation
MethodsSoftmax · Attention Is All You Need
