Multi-manifold Attention for Vision Transformers
Dimitrios Konstantinidis, Ilias Papastratis, Kosmas Dimitropoulos,, Petros Daras

TL;DR
This paper introduces a multi-manifold multihead attention mechanism for Vision Transformers that models input data across Euclidean, Symmetric Positive Definite, and Grassmann manifolds to enhance attention maps and improve vision task performance.
Contribution
It proposes a novel multi-manifold attention mechanism that integrates multiple geometric representations to refine self-attention in Vision Transformers.
Findings
Improved image classification accuracy on benchmark datasets.
Enhanced segmentation performance with the new attention mechanism.
Demonstrated effectiveness across multiple vision tasks.
Abstract
Vision Transformers are very popular nowadays due to their state-of-the-art performance in several computer vision tasks, such as image classification and action recognition. Although their performance has been greatly enhanced through highly descriptive patch embeddings and hierarchical structures, there is still limited research on utilizing additional data representations so as to refine the selfattention map of a Transformer. To address this problem, a novel attention mechanism, called multi-manifold multihead attention, is proposed in this work to substitute the vanilla self-attention of a Transformer. The proposed mechanism models the input space in three distinct manifolds, namely Euclidean, Symmetric Positive Definite and Grassmann, thus leveraging different statistical and geometrical properties of the input for the computation of a highly descriptive attention map. In this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Industrial Vision Systems and Defect Detection
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Dropout · Byte Pair Encoding · Adam · Label Smoothing · Position-Wise Feed-Forward Layer · Dense Connections
