SpectFormer: Frequency and Attention is what you need in a Vision   Transformer

Badri N. Patro; Vinay P. Namboodiri; Vijay Srinivas Agneeswaran

arXiv:2304.06446·cs.CV·April 18, 2023·53 cites

SpectFormer: Frequency and Attention is what you need in a Vision Transformer

Badri N. Patro, Vinay P. Namboodiri, Vijay Srinivas Agneeswaran

PDF

Open Access 1 Repo

TL;DR

SpectFormer introduces a novel transformer architecture that combines spectral and multi-headed attention layers, leading to improved image recognition performance and state-of-the-art results on ImageNet and other datasets.

Contribution

This work proposes the Spectformer architecture, integrating spectral and multi-headed attention layers, demonstrating superior performance over existing vision transformers.

Findings

01

SpectFormer-S achieves 84.25% top-1 accuracy on ImageNet-1K.

02

SpectFormer-L achieves 85.7% top-1 accuracy, setting a new state of the art.

03

Spectformer performs well in transfer learning and downstream tasks like object detection.

Abstract

Vision transformers have been applied successfully for image recognition tasks. There have been either multi-headed self-attention based (ViT \cite{dosovitskiy2020image}, DeIT, \cite{touvron2021training}) similar to the original work in textual models or more recently based on spectral layers (Fnet\cite{lee2021fnet}, GFNet\cite{rao2021global}, AFNO\cite{guibas2021efficient}). We hypothesize that both spectral and multi-headed attention plays a major role. We investigate this hypothesis through this work and observe that indeed combining spectral and multi-headed attention layers provides a better transformer architecture. We thus propose the novel Spectformer architecture for transformers that combines spectral and multi-headed attention layers. We believe that the resulting representation allows the transformer to capture the feature representation appropriately and it yields improved…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Hazqeel09/ellzaf_ml
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Domain Adaptation and Few-Shot Learning

MethodsBalanced Selection