Multi-Dimensional Model Compression of Vision Transformer

Zejiang Hou; Sun-Yuan Kung

arXiv:2201.00043·cs.CV·January 4, 2022

Multi-Dimensional Model Compression of Vision Transformer

Zejiang Hou, Sun-Yuan Kung

PDF

Open Access 1 Repo

TL;DR

This paper introduces a multi-dimensional compression method for Vision Transformers that jointly prunes attention heads, neurons, and sequence components, significantly reducing computational costs while maintaining accuracy.

Contribution

It proposes a novel multi-dimensional pruning framework with a statistical dependence criterion and an optimized pruning policy via Gaussian process search.

Findings

01

Reduces 40% FLOPs on DeiT and T2T-ViT models without accuracy loss.

02

Outperforms previous state-of-the-art pruning methods.

03

Effectively balances model compression and accuracy.

Abstract

Vision transformers (ViT) have recently attracted considerable attentions, but the huge computational cost remains an issue for practical deployment. Previous ViT pruning methods tend to prune the model along one dimension solely, which may suffer from excessive reduction and lead to sub-optimal model quality. In contrast, we advocate a multi-dimensional ViT compression paradigm, and propose to harness the redundancy reduction from attention head, neuron and sequence dimensions jointly. We firstly propose a statistical dependence based pruning criterion that is generalizable to different dimensions for identifying deleterious components. Moreover, we cast the multi-dimensional compression as an optimization, learning the optimal pruning policy across the three dimensions that maximizes the compressed model's accuracy under a computational budget. The problem is solved by our adapted…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zejiangh/Multi-dimensional-vit-compression
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Enhancement Techniques · CCD and CMOS Imaging Sensors · Advanced Vision and Imaging

MethodsMulti-Head Attention · Attention Is All You Need · Pruning · Linear Layer · Layer Normalization · Residual Connection · Softmax · Gaussian Process · Attention Dropout · Dense Connections