Short-LVLM: Compressing and Accelerating Large Vision-Language Models by Pruning Redundant Layers

Ji Ma; Wei Suo; Peng Wang; Yanning Zhang

arXiv:2507.23362·cs.CV·August 1, 2025

Short-LVLM: Compressing and Accelerating Large Vision-Language Models by Pruning Redundant Layers

Ji Ma, Wei Suo, Peng Wang, Yanning Zhang

PDF

Open Access

TL;DR

This paper introduces Short-LVLM, a novel framework for compressing large vision-language models by pruning redundant layers, focusing on preserving important tokens and reducing feature gaps, resulting in efficient models without retraining.

Contribution

The paper demonstrates the ineffectiveness of NLP layer pruning methods on LVLMs and proposes a new, training-free, model-agnostic framework that improves efficiency while maintaining performance.

Findings

01

Short-LVLM achieves better performance-efficiency trade-offs.

02

It is training-free and highly compatible with existing models.

03

The method effectively preserves important vision-language tokens.

Abstract

Although large vision-language models (LVLMs) have demonstrated impressive capabilities in multi-modal understanding and reasoning, their practical applications are still limited by massive model parameters and high computational costs. Recent efforts from natural language processing (NLP) have shown the effectiveness of layer pruning, offering a plausible training-free compression solution. However, due to the modality divergence between vision and language, it is unclear whether these NLP techniques are still effective in LVLMs. In this paper, we empirically prove that directly applying these layer pruning methods to LVLMs is ineffective. Through extensive experiments, we find that non-essential vision-language (VL) tokens and inter-layer feature gaps pose critical challenges to pruning layers in LVLMs. Based on these insights, we propose a novel framework Short-LVLM (SVL) that can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques