UPop: Unified and Progressive Pruning for Compressing Vision-Language   Transformers

Dachuan Shi; Chaofan Tao; Ying Jin; Zhendong Yang; Chun Yuan; Jiaqi; Wang

arXiv:2301.13741·cs.CV·July 3, 2023·6 cites

UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers

Dachuan Shi, Chaofan Tao, Ying Jin, Zhendong Yang, Chun Yuan, Jiaqi, Wang

PDF

Open Access 2 Repos 1 Video

TL;DR

UPop introduces a unified, progressive pruning framework for compressing vision-language Transformers, automatically optimizing subnet structures and ratios to achieve high compression without significant performance loss.

Contribution

It proposes a novel universal pruning method that combines continuous search and progressive retraining for multimodal Transformer compression.

Findings

01

Effective across various tasks and datasets

02

Achieves high compression ratios with maintained accuracy

03

Versatile for different model architectures

Abstract

Real-world data contains a vast amount of multimodal information, among which vision and language are the two most representative modalities. Moreover, increasingly heavier models, \textit{e}.\textit{g}., Transformers, have attracted the attention of researchers to model compression. However, how to compress multimodal models, especially vison-language Transformers, is still under-explored. This paper proposes the \textbf{U}nified and \textbf{P}r\textbf{o}gressive \textbf{P}runing (\textbf{\emph{UPop}}) as a universal vison-language Transformer compression framework, which incorporates 1) unifiedly searching multimodal subnets in a continuous optimization space from the original model, which enables automatic assignment of pruning ratios among compressible modalities and structures; 2) progressively searching and retraining the subnet, which maintains convergence between the search and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling

MethodsAttention Is All You Need · Pruning · Linear Layer · Byte Pair Encoding · Layer Normalization · Label Smoothing · Adam · Multi-Head Attention · Residual Connection · Dense Connections