ReDiPrune: Relevance-Diversity Pre-Projection Token Pruning for Efficient Multimodal LLMs

An Yu; Ting Yu Tsai; Zhenfei Zhang; Weiheng Lu; Felix X.-F. Ye; Ming-Ching Chang

arXiv:2603.24680·cs.CV·April 1, 2026

ReDiPrune: Relevance-Diversity Pre-Projection Token Pruning for Efficient Multimodal LLMs

An Yu, Ting Yu Tsai, Zhenfei Zhang, Weiheng Lu, Felix X.-F. Ye, Ming-Ching Chang

PDF

1 Repo

TL;DR

ReDiPrune is a training-free token pruning method for multimodal LLMs that selects relevant and diverse visual tokens before the projection layer, improving efficiency without retraining.

Contribution

It introduces a novel, plug-and-play token pruning technique that operates before the vision-language projector, enhancing accuracy-efficiency trade-offs in multimodal models.

Findings

01

Retaining 15% of tokens improves accuracy by 2.0% on EgoSchema.

02

ReDiPrune reduces computation by over 6 times in TFLOPs.

03

It outperforms post-projection pruning methods across multiple benchmarks.

Abstract

Recent multimodal large language models are computationally expensive because Transformers must process a large number of visual tokens. We present ReDiPrune, a training-free token pruning method applied before the vision-language projector, where visual features remain rich and discriminative. Unlike post-projection pruning methods that operate on compressed representations, ReDiPrune selects informative tokens directly from vision encoder outputs, preserving fine-grained spatial and semantic cues. Each token is scored by a lightweight rule that jointly consider text-conditioned relevance and max-min diversity, ensuring the selected tokens are both query-relevant and non-redundant. ReDiPrune is fully plug-and-play, requiring no retraining or architectural modifications, and can be seamlessly inserted between the encoder and projector. Across four video and five image benchmarks, it…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

UA-CVML/ReDiPrune
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.