Multi-Dimensional Model Compression of Vision Transformer
Zejiang Hou, Sun-Yuan Kung

TL;DR
This paper introduces a multi-dimensional compression method for Vision Transformers that jointly prunes attention heads, neurons, and sequence components, significantly reducing computational costs while maintaining accuracy.
Contribution
It proposes a novel multi-dimensional pruning framework with a statistical dependence criterion and an optimized pruning policy via Gaussian process search.
Findings
Reduces 40% FLOPs on DeiT and T2T-ViT models without accuracy loss.
Outperforms previous state-of-the-art pruning methods.
Effectively balances model compression and accuracy.
Abstract
Vision transformers (ViT) have recently attracted considerable attentions, but the huge computational cost remains an issue for practical deployment. Previous ViT pruning methods tend to prune the model along one dimension solely, which may suffer from excessive reduction and lead to sub-optimal model quality. In contrast, we advocate a multi-dimensional ViT compression paradigm, and propose to harness the redundancy reduction from attention head, neuron and sequence dimensions jointly. We firstly propose a statistical dependence based pruning criterion that is generalizable to different dimensions for identifying deleterious components. Moreover, we cast the multi-dimensional compression as an optimization, learning the optimal pruning policy across the three dimensions that maximizes the compressed model's accuracy under a computational budget. The problem is solved by our adapted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Enhancement Techniques · CCD and CMOS Imaging Sensors · Advanced Vision and Imaging
MethodsMulti-Head Attention · Attention Is All You Need · Pruning · Linear Layer · Layer Normalization · Residual Connection · Softmax · Gaussian Process · Attention Dropout · Dense Connections
