Automatic Channel Pruning for Multi-Head Attention
Eunho Lee, Youngbae Hwang

TL;DR
This paper introduces an automatic channel pruning method tailored for multi-head attention in Transformers, effectively reducing computation while maintaining accuracy, applicable to both original and linear attention mechanisms.
Contribution
The proposed method uniquely addresses channel misalignment in multi-head attention pruning, incorporating similarity-based weights and equal channel removal across heads.
Findings
Outperforms previous models on ImageNet-1K in accuracy and efficiency.
Effective pruning of multi-head attention with minimal accuracy loss.
Applicable to both original and linear attention mechanisms.
Abstract
Despite the strong performance of Transformers, their quadratic computation complexity presents challenges in applying them to vision tasks. Automatic pruning is one of effective methods for reducing computation complexity without heuristic approaches. However, directly applying it to multi-head attention is not straightforward due to channel misalignment. In this paper, we propose an automatic channel pruning method to take into account the multi-head attention mechanism. First, we incorporate channel similarity-based weights into the pruning indicator to preserve more informative channels in each head. Then, we adjust pruning indicator to enforce removal of channels in equal proportions across all heads, preventing the channel misalignment. We also add a reweight module to compensate for information loss resulting from channel removal, and an effective initialization step for pruning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEEG and Brain-Computer Interfaces · Speech and Audio Processing · Blind Source Separation Techniques
MethodsAttention Is All You Need · Softmax · Pruning · Linear Layer · Multi-Head Attention
