Multi-manifold Attention for Vision Transformers

Dimitrios Konstantinidis; Ilias Papastratis; Kosmas Dimitropoulos,; Petros Daras

arXiv:2207.08569·cs.CV·October 28, 2024

Multi-manifold Attention for Vision Transformers

Dimitrios Konstantinidis, Ilias Papastratis, Kosmas Dimitropoulos,, Petros Daras

PDF

Open Access

TL;DR

This paper introduces a multi-manifold multihead attention mechanism for Vision Transformers that models input data across Euclidean, Symmetric Positive Definite, and Grassmann manifolds to enhance attention maps and improve vision task performance.

Contribution

It proposes a novel multi-manifold attention mechanism that integrates multiple geometric representations to refine self-attention in Vision Transformers.

Findings

01

Improved image classification accuracy on benchmark datasets.

02

Enhanced segmentation performance with the new attention mechanism.

03

Demonstrated effectiveness across multiple vision tasks.

Abstract

Vision Transformers are very popular nowadays due to their state-of-the-art performance in several computer vision tasks, such as image classification and action recognition. Although their performance has been greatly enhanced through highly descriptive patch embeddings and hierarchical structures, there is still limited research on utilizing additional data representations so as to refine the selfattention map of a Transformer. To address this problem, a novel attention mechanism, called multi-manifold multihead attention, is proposed in this work to substitute the vanilla self-attention of a Transformer. The proposed mechanism models the input space in three distinct manifolds, namely Euclidean, Symmetric Positive Definite and Grassmann, thus leveraging different statistical and geometrical properties of the input for the computation of a highly descriptive attention map. In this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Industrial Vision Systems and Defect Detection

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Dropout · Byte Pair Encoding · Adam · Label Smoothing · Position-Wise Feed-Forward Layer · Dense Connections