Mostly Text, Smart Visuals: Asymmetric Text-Visual Pruning for Large Vision-Language Models

Sijie Li; Biao Qian; Jungong Han

arXiv:2603.16001·cs.CV·March 18, 2026

Mostly Text, Smart Visuals: Asymmetric Text-Visual Pruning for Large Vision-Language Models

Sijie Li, Biao Qian, Jungong Han

PDF

Open Access

TL;DR

This paper introduces ATV-Pruning, a novel asymmetric pruning method for large vision-language models that selectively prunes textual and visual tokens based on their sensitivity, improving efficiency without sacrificing performance.

Contribution

The paper presents a new asymmetric pruning approach that considers modality-specific behaviors, including a calibration pool and layer-adaptive token selection, for more effective LVLM pruning.

Findings

01

ATV-Pruning outperforms state-of-the-art pruning methods on multimodal benchmarks.

02

Textual tokens require calibration due to higher sensitivity.

03

Visual tokens exhibit high redundancy, allowing 50% sparsity.

Abstract

Network pruning is an effective technique for enabling lightweight Large Vision-Language Models (LVLMs), which primarily incorporates both weights and activations into the importance metric. However, existing efforts typically process calibration data from different modalities in a unified manner, overlooking modality-specific behaviors. This raises a critical challenge: how to address the divergent behaviors of textual and visual tokens for accurate pruning of LVLMs. To this end, we systematically investigate the sensitivity of visual and textual tokens to the pruning operation by decoupling their corresponding weights, revealing that: (i) the textual pathway should be calibrated via text tokens, since it exhibits higher sensitivity than the visual pathway; (ii) the visual pathway exhibits high redundancy, permitting even 50% sparsity. Motivated by these insights, we propose a simple…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications