Mostly Text, Smart Visuals: Asymmetric Text-Visual Pruning for Large Vision-Language Models
Sijie Li, Biao Qian, Jungong Han

TL;DR
This paper introduces ATV-Pruning, a novel asymmetric pruning method for large vision-language models that selectively prunes textual and visual tokens based on their sensitivity, improving efficiency without sacrificing performance.
Contribution
The paper presents a new asymmetric pruning approach that considers modality-specific behaviors, including a calibration pool and layer-adaptive token selection, for more effective LVLM pruning.
Findings
ATV-Pruning outperforms state-of-the-art pruning methods on multimodal benchmarks.
Textual tokens require calibration due to higher sensitivity.
Visual tokens exhibit high redundancy, allowing 50% sparsity.
Abstract
Network pruning is an effective technique for enabling lightweight Large Vision-Language Models (LVLMs), which primarily incorporates both weights and activations into the importance metric. However, existing efforts typically process calibration data from different modalities in a unified manner, overlooking modality-specific behaviors. This raises a critical challenge: how to address the divergent behaviors of textual and visual tokens for accurate pruning of LVLMs. To this end, we systematically investigate the sensitivity of visual and textual tokens to the pruning operation by decoupling their corresponding weights, revealing that: (i) the textual pathway should be calibrated via text tokens, since it exhibits higher sensitivity than the visual pathway; (ii) the visual pathway exhibits high redundancy, permitting even 50% sparsity. Motivated by these insights, we propose a simple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
