SpectFormer: Frequency and Attention is what you need in a Vision Transformer
Badri N. Patro, Vinay P. Namboodiri, Vijay Srinivas Agneeswaran

TL;DR
SpectFormer introduces a novel transformer architecture that combines spectral and multi-headed attention layers, leading to improved image recognition performance and state-of-the-art results on ImageNet and other datasets.
Contribution
This work proposes the Spectformer architecture, integrating spectral and multi-headed attention layers, demonstrating superior performance over existing vision transformers.
Findings
SpectFormer-S achieves 84.25% top-1 accuracy on ImageNet-1K.
SpectFormer-L achieves 85.7% top-1 accuracy, setting a new state of the art.
Spectformer performs well in transfer learning and downstream tasks like object detection.
Abstract
Vision transformers have been applied successfully for image recognition tasks. There have been either multi-headed self-attention based (ViT \cite{dosovitskiy2020image}, DeIT, \cite{touvron2021training}) similar to the original work in textual models or more recently based on spectral layers (Fnet\cite{lee2021fnet}, GFNet\cite{rao2021global}, AFNO\cite{guibas2021efficient}). We hypothesize that both spectral and multi-headed attention plays a major role. We investigate this hypothesis through this work and observe that indeed combining spectral and multi-headed attention layers provides a better transformer architecture. We thus propose the novel Spectformer architecture for transformers that combines spectral and multi-headed attention layers. We believe that the resulting representation allows the transformer to capture the feature representation appropriately and it yields improved…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Domain Adaptation and Few-Shot Learning
MethodsBalanced Selection
