RCP: Representation Consistency Pruner for Mitigating Distribution Shift in Large Vision-Language Models

Jianwei Zhang; Chaoning Zhang; Sihan Cao; Wang Liu; Pengcheng Zheng; Jiaxin Huang; Caiyan Qin; Yalan Ye; Wei Dong; Yang Yang

arXiv:2604.04972·cs.CV·April 8, 2026

RCP: Representation Consistency Pruner for Mitigating Distribution Shift in Large Vision-Language Models

Jianwei Zhang, Chaoning Zhang, Sihan Cao, Wang Liu, Pengcheng Zheng, Jiaxin Huang, Caiyan Qin, Yalan Ye, Wei Dong, Yang Yang

PDF

TL;DR

RCP is a novel pruning framework for large vision-language models that reduces visual tokens and computational costs while maintaining performance through a delayed repair mechanism.

Contribution

It introduces a cross-attention based cumulative pruning method with a delayed repair adapter, enabling efficient token reduction without fine-tuning the entire model.

Findings

01

Removes up to 88.9% of visual tokens

02

Reduces FLOPs by up to 85.7%

03

Maintains performance with marginal accuracy drop

Abstract

Large Vision-Language Models (LVLMs) suffer from prohibitive inference costs due to the massive number of visual tokens processed by the language decoder. Existing pruning methods often lead to significant performance degradation because the irreversible removal of visual tokens causes a distribution shift in the hidden states that deviates from the pre-trained full-token regime. To address this, we propose Representation Consistency Pruner, which we refer to as RCP, as a novel framework that integrates cumulative visual token pruning with a delayed repair mechanism. Specifically, we introduce a cross-attention pruner that leverages the intrinsic attention of the LLM as a baseline to predict cumulative masks, ensuring consistent and monotonic token reduction across layers. To compensate for the resulting information loss, we design a delayed repair adapter denoted as DRA, which caches…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.