PRANCE: Joint Token-Optimization and Structural Channel-Pruning for Adaptive ViT Inference
Ye Li, Chen Tang, Yuan Meng, Jiajun Fan, Zenghao Chai, Xinzhu Ma, Zhi, Wang, Wenwu Zhu

TL;DR
PRANCE is a novel framework that jointly optimizes token reduction and channel pruning in Vision Transformers, achieving significant computational savings while maintaining accuracy through a meta-network and reinforcement learning strategies.
Contribution
It introduces a joint token-optimization and structural pruning framework for ViTs, utilizing a meta-network and reinforcement learning with a new training mechanism for efficient inference.
Findings
Reduces FLOPs by about 50% without accuracy loss
Retains only 10% of tokens during inference
Compatible with various token optimization techniques
Abstract
We introduce PRANCE, a Vision Transformer compression framework that jointly optimizes the activated channels and reduces tokens, based on the characteristics of inputs. Specifically, PRANCE~ leverages adaptive token optimization strategies for a certain computational budget, aiming to accelerate ViTs' inference from a unified data and architectural perspective. However, the joint framework poses challenges to both architectural and decision-making aspects. Firstly, while ViTs inherently support variable-token inference, they do not facilitate dynamic computations for variable channels. To overcome this limitation, we propose a meta-network using weight-sharing techniques to support arbitrary channels of the Multi-head Self-Attention and Multi-layer Perceptron layers, serving as a foundational model for architectural decision-making. Second, simultaneously optimizing the structure of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Advanced Steganography and Watermarking Techniques · Advanced Image Processing Techniques
MethodsAttention Is All You Need · Softmax · Byte Pair Encoding · Layer Normalization · Linear Layer · Label Smoothing · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam · Dropout
