Pruning All-Rounder: Rethinking and Improving Inference Efficiency for Large Vision Language Models

Wei Suo; Ji Ma; Mengyang Sun; Lin Yuanbo Wu; Peng Wang; Yanning Zhang

arXiv:2412.06458·cs.CV·August 1, 2025

Pruning All-Rounder: Rethinking and Improving Inference Efficiency for Large Vision Language Models

Wei Suo, Ji Ma, Mengyang Sun, Lin Yuanbo Wu, Peng Wang, Yanning Zhang

PDF

Open Access

TL;DR

This paper introduces the Pruning All-Rounder (PAR), a flexible framework that adaptively prunes tokens and layers in large vision-language models, significantly improving inference efficiency while maintaining performance.

Contribution

The paper proposes a novel meta-router based pruning framework that adaptively organizes pruning flows across tokens and layers, enhancing inference efficiency without retraining models.

Findings

01

PAR achieves a better balance between performance and efficiency.

02

The framework offers multiple pruning configurations for different scenarios.

03

Code is publicly available for reproducibility.

Abstract

Although Large Vision-Language Models (LVLMs) have achieved impressive results, their high computational costs pose a significant barrier to wide application. To enhance inference efficiency, most existing approaches can be categorized as parameter-dependent or token-dependent strategies to reduce computational demands. However, parameter-dependent methods require retraining LVLMs to recover performance while token-dependent strategies struggle to consistently select the most relevant tokens. In this paper, we systematically analyze the above challenges and provide a series of valuable insights for inference acceleration. Based on these findings, we propose a novel framework, the Pruning All-Rounder (PAR). Different from previous works, PAR develops a meta-router to adaptively organize pruning flows across both tokens and layers. With a self-supervised learning manner, our method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling

MethodsPruning