Automatic Channel Pruning for Multi-Head Attention

Eunho Lee; Youngbae Hwang

arXiv:2405.20867·cs.CV·June 3, 2024

Automatic Channel Pruning for Multi-Head Attention

Eunho Lee, Youngbae Hwang

PDF

Open Access

TL;DR

This paper introduces an automatic channel pruning method tailored for multi-head attention in Transformers, effectively reducing computation while maintaining accuracy, applicable to both original and linear attention mechanisms.

Contribution

The proposed method uniquely addresses channel misalignment in multi-head attention pruning, incorporating similarity-based weights and equal channel removal across heads.

Findings

01

Outperforms previous models on ImageNet-1K in accuracy and efficiency.

02

Effective pruning of multi-head attention with minimal accuracy loss.

03

Applicable to both original and linear attention mechanisms.

Abstract

Despite the strong performance of Transformers, their quadratic computation complexity presents challenges in applying them to vision tasks. Automatic pruning is one of effective methods for reducing computation complexity without heuristic approaches. However, directly applying it to multi-head attention is not straightforward due to channel misalignment. In this paper, we propose an automatic channel pruning method to take into account the multi-head attention mechanism. First, we incorporate channel similarity-based weights into the pruning indicator to preserve more informative channels in each head. Then, we adjust pruning indicator to enforce removal of channels in equal proportions across all heads, preventing the channel misalignment. We also add a reweight module to compensate for information loss resulting from channel removal, and an effective initialization step for pruning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEEG and Brain-Computer Interfaces · Speech and Audio Processing · Blind Source Separation Techniques

MethodsAttention Is All You Need · Softmax · Pruning · Linear Layer · Multi-Head Attention