Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs

Qizhe Zhang; Mengzhen Liu; Lichen Li; Ming Lu; Yuan Zhang; Junwen Pan; Qi She; Shanghang Zhang

arXiv:2506.10967·cs.CV·July 2, 2025

Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs

Qizhe Zhang, Mengzhen Liu, Lichen Li, Ming Lu, Yuan Zhang, Junwen Pan, Qi She, Shanghang Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces CDPruner, a training-free, model-agnostic visual token pruning method for multimodal large language models that maximizes conditional diversity to improve efficiency without sacrificing accuracy.

Contribution

It proposes a novel token pruning approach based on maximizing conditional diversity using DPP, outperforming existing attention and similarity-based methods.

Findings

01

Achieves state-of-the-art results on vision-language benchmarks.

02

Reduces FLOPs by 95% and CUDA latency by 78% on LLaVA.

03

Maintains 94% of original accuracy with high token reduction.

Abstract

In multimodal large language models (MLLMs), the length of input visual tokens is often significantly greater than that of their textual counterparts, leading to a high inference cost. Many works aim to address this issue by removing redundant visual tokens. However, current approaches either rely on attention-based pruning, which retains numerous duplicate tokens, or use similarity-based pruning, overlooking the instruction relevance, consequently causing suboptimal performance. In this paper, we go beyond attention or similarity by proposing a novel visual token pruning method named CDPruner, which maximizes the conditional diversity of retained tokens. We first define the conditional similarity between visual tokens conditioned on the instruction, and then reformulate the token pruning problem with determinantal point process (DPP) to maximize the conditional diversity of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

theia-4869/cdpruner
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning

MethodsSoftmax · Attention Is All You Need · Pruning