ToaSt: Token Channel Selection and Structured Pruning for Efficient ViT
Hyunchan Moon, Cheonjun Park, Steven L. Waslander

TL;DR
ToaSt introduces a decoupled framework for efficient Vision Transformer pruning, combining head-wise structured pruning and token channel selection to significantly reduce computational costs while maintaining high accuracy.
Contribution
It proposes a novel decoupled approach applying specialized strategies to different ViT components, improving efficiency and robustness over existing methods.
Findings
Achieves 39.4% FLOPs reduction on ViT-MAE-Huge with 88.52% accuracy.
Outperforms baselines in accuracy-efficiency trade-offs across nine models.
Transfers effectively to downstream tasks like object detection.
Abstract
Vision Transformers (ViTs) have achieved remarkable success across various vision tasks, yet their deployment is often hindered by prohibitive computational costs. While structured weight pruning and token compression have emerged as promising solutions, they suffer from prolonged retraining times and global propagation that creates optimization challenges, respectively. We propose ToaSt, a decoupled framework applying specialized strategies to distinct ViT components. We apply coupled head-wise structured pruning to Multi-Head Self-Attention modules, leveraging attention operation characteristics to enhance robustness. For Feed-Forward Networks (over 60\% of FLOPs), we introduce Token Channel Selection (TCS) that enhances compression ratios while avoiding global propagation issues. Our analysis reveals TCS effectively filters redundant noise during selection. Extensive evaluations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Image Enhancement Techniques
