UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers
Dachuan Shi, Chaofan Tao, Ying Jin, Zhendong Yang, Chun Yuan, Jiaqi, Wang

TL;DR
UPop introduces a unified, progressive pruning framework for compressing vision-language Transformers, automatically optimizing subnet structures and ratios to achieve high compression without significant performance loss.
Contribution
It proposes a novel universal pruning method that combines continuous search and progressive retraining for multimodal Transformer compression.
Findings
Effective across various tasks and datasets
Achieves high compression ratios with maintained accuracy
Versatile for different model architectures
Abstract
Real-world data contains a vast amount of multimodal information, among which vision and language are the two most representative modalities. Moreover, increasingly heavier models, \textit{e}.\textit{g}., Transformers, have attracted the attention of researchers to model compression. However, how to compress multimodal models, especially vison-language Transformers, is still under-explored. This paper proposes the \textbf{U}nified and \textbf{P}r\textbf{o}gressive \textbf{P}runing (\textbf{\emph{UPop}}) as a universal vison-language Transformer compression framework, which incorporates 1) unifiedly searching multimodal subnets in a continuous optimization space from the original model, which enables automatic assignment of pruning ratios among compressible modalities and structures; 2) progressively searching and retraining the subnet, which maintains convergence between the search and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
MethodsAttention Is All You Need · Pruning · Linear Layer · Byte Pair Encoding · Layer Normalization · Label Smoothing · Adam · Multi-Head Attention · Residual Connection · Dense Connections
