PRANCE: Joint Token-Optimization and Structural Channel-Pruning for   Adaptive ViT Inference

Ye Li; Chen Tang; Yuan Meng; Jiajun Fan; Zenghao Chai; Xinzhu Ma; Zhi; Wang; Wenwu Zhu

arXiv:2407.05010·cs.CV·July 9, 2024

PRANCE: Joint Token-Optimization and Structural Channel-Pruning for Adaptive ViT Inference

Ye Li, Chen Tang, Yuan Meng, Jiajun Fan, Zenghao Chai, Xinzhu Ma, Zhi, Wang, Wenwu Zhu

PDF

Open Access 1 Repo

TL;DR

PRANCE is a novel framework that jointly optimizes token reduction and channel pruning in Vision Transformers, achieving significant computational savings while maintaining accuracy through a meta-network and reinforcement learning strategies.

Contribution

It introduces a joint token-optimization and structural pruning framework for ViTs, utilizing a meta-network and reinforcement learning with a new training mechanism for efficient inference.

Findings

01

Reduces FLOPs by about 50% without accuracy loss

02

Retains only 10% of tokens during inference

03

Compatible with various token optimization techniques

Abstract

We introduce PRANCE, a Vision Transformer compression framework that jointly optimizes the activated channels and reduces tokens, based on the characteristics of inputs. Specifically, PRANCE~ leverages adaptive token optimization strategies for a certain computational budget, aiming to accelerate ViTs' inference from a unified data and architectural perspective. However, the joint framework poses challenges to both architectural and decision-making aspects. Firstly, while ViTs inherently support variable-token inference, they do not facilitate dynamic computations for variable channels. To overcome this limitation, we propose a meta-network using weight-sharing techniques to support arbitrary channels of the Multi-head Self-Attention and Multi-layer Perceptron layers, serving as a foundational model for architectural decision-making. Second, simultaneously optimizing the structure of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

childtang/prance
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media Forensic Detection · Advanced Steganography and Watermarking Techniques · Advanced Image Processing Techniques

MethodsAttention Is All You Need · Softmax · Byte Pair Encoding · Layer Normalization · Linear Layer · Label Smoothing · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam · Dropout