Plug n' Play: Channel Shuffle Module for Enhancing Tiny Vision Transformers
Xuwei Xu, Sen Wang, Yudong Chen, Jiajun Liu

TL;DR
This paper introduces a channel shuffle module for tiny Vision Transformers, significantly improving their accuracy on ImageNet-1K with minimal additional computational cost, thus enhancing their suitability for resource-constrained environments.
Contribution
The paper proposes a novel channel shuffle module that enhances tiny ViTs by enabling effective information exchange between feature groups without increasing complexity.
Findings
Up to 2.8% top-1 accuracy improvement on ImageNet-1K
Minimal increase in computational complexity (<0.03 GMACs)
Effective enhancement of tiny ViTs using pure self-attention mechanisms
Abstract
Vision Transformers (ViTs) have demonstrated remarkable performance in various computer vision tasks. However, the high computational complexity hinders ViTs' applicability on devices with limited memory and computing resources. Although certain investigations have delved into the fusion of convolutional layers with self-attention mechanisms to enhance the efficiency of ViTs, there remains a knowledge gap in constructing tiny yet effective ViTs solely based on the self-attention mechanism. Furthermore, the straightforward strategy of reducing the feature channels in a large but outperforming ViT often results in significant performance degradation despite improved efficiency. To address these challenges, we propose a novel channel shuffle module to improve tiny-size ViTs, showing the potential of pure self-attention models in environments with constrained computing resources. Inspired…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · CCD and CMOS Imaging Sensors · Visual Attention and Saliency Detection
MethodsChannel Shuffle
