Scattering Vision Transformer: Spectral Mixing Matters

Badri N. Patro; Vijay Srinivas Agneeswaran

arXiv:2311.01310·cs.CV·November 21, 2023·6 cites

Scattering Vision Transformer: Spectral Mixing Matters

Badri N. Patro, Vijay Srinivas Agneeswaran

PDF

Open Access 1 Video

TL;DR

The paper introduces the Scattering Vision Transformer (SVT), a novel model that captures detailed image information efficiently using spectral methods, achieving state-of-the-art results with reduced complexity.

Contribution

SVT incorporates spectral scattering and gating mechanisms to improve detail capture and reduce computational complexity in vision transformers.

Findings

01

SVT achieves state-of-the-art accuracy on ImageNet.

02

SVT reduces parameters and FLOPS compared to previous models.

03

SVT performs well in transfer learning and other vision tasks.

Abstract

Vision transformers have gained significant attention and achieved state-of-the-art performance in various computer vision tasks, including image classification, instance segmentation, and object detection. However, challenges remain in addressing attention complexity and effectively capturing fine-grained information within images. Existing solutions often resort to down-sampling operations, such as pooling, to reduce computational cost. Unfortunately, such operations are non-invertible and can result in information loss. In this paper, we present a novel approach called Scattering Vision Transformer (SVT) to tackle these challenges. SVT incorporates a spectrally scattering network that enables the capture of intricate image details. SVT overcomes the invertibility issue associated with down-sampling operations by separating low-frequency and high-frequency components. Furthermore, SVT…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Scattering Vision Transformer: Spectral Mixing Matters· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Remote-Sensing Image Classification

MethodsAttention Is All You Need · Dense Connections · Dropout · Byte Pair Encoding · Softmax · Layer Normalization · Linear Layer · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Label Smoothing