Dynamic Spectrum Mixer for Visual Recognition

Zhiqiang Hu; Tao Yu

arXiv:2309.06721·cs.CV·September 18, 2023·1 cites

Dynamic Spectrum Mixer for Visual Recognition

Zhiqiang Hu, Tao Yu

PDF

Open Access

TL;DR

The paper introduces the Dynamic Spectrum Mixer (DSM), a novel frequency-domain approach for vision backbones that adaptively emphasizes informative spectral bands, improving performance across various visual recognition tasks.

Contribution

It proposes a content-adaptive, frequency-domain token interaction method using DCT and dynamic spectrum weights, enhancing adaptability and efficiency over existing MLP and transformer models.

Findings

01

Achieves 83.8% top-1 accuracy on ImageNet

02

Attains 49.9% mIoU on ADE20K

03

Outperforms previous models in classification, detection, segmentation

Abstract

Recently, MLP-based vision backbones have achieved promising performance in several visual recognition tasks. However, the existing MLP-based methods directly aggregate tokens with static weights, leaving the adaptability to different images untouched. Moreover, Recent research demonstrates that MLP-Transformer is great at creating long-range dependencies but ineffective at catching high frequencies that primarily transmit local information, which prevents it from applying to the downstream dense prediction tasks, such as semantic segmentation. To address these challenges, we propose a content-adaptive yet computationally efficient structure, dubbed Dynamic Spectrum Mixer (DSM). The DSM represents token interactions in the frequency domain by employing the Discrete Cosine Transform, which can learn long-term spatial dependencies with log-linear complexity. Furthermore, a dynamic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Advanced Computing and Algorithms · Advanced Image and Video Retrieval Techniques

MethodsDiscrete Cosine Transform