SCOPE: Saliency-Coverage Oriented Token Pruning for Efficient Multimodel LLMs

Jinhong Deng; Wen Li; Joey Tianyi Zhou; Yang He

arXiv:2510.24214·cs.CV·October 29, 2025

SCOPE: Saliency-Coverage Oriented Token Pruning for Efficient Multimodel LLMs

Jinhong Deng, Wen Li, Joey Tianyi Zhou, Yang He

PDF

1 Video

TL;DR

SCOPE introduces a novel token pruning method for multimodal large language models that jointly models saliency and coverage to better preserve semantic information while reducing computational costs.

Contribution

The paper proposes SCOPE, a new token pruning strategy that combines saliency and coverage modeling, improving semantic preservation in multimodal LLMs.

Findings

01

Outperforms prior token pruning methods on multiple benchmarks.

02

Effectively reduces computational overhead without sacrificing accuracy.

03

Demonstrates robustness across different vision-language tasks.

Abstract

Multimodal Large Language Models (MLLMs) typically process a large number of visual tokens, leading to considerable computational overhead, even though many of these tokens are redundant. Existing visual token pruning methods primarily focus on selecting the most salient tokens based on attention scores, resulting in the semantic incompleteness of the selected tokens. In this paper, we propose a novel visual token pruning strategy, called \textbf{S}aliency-\textbf{C}overage \textbf{O}riented token \textbf{P}runing for \textbf{E}fficient MLLMs (SCOPE), to jointly model both the saliency and coverage of the selected visual tokens to better preserve semantic completeness. Specifically, we introduce a set-coverage for a given set of selected tokens, computed based on the token relationships. We then define a token-coverage gain for each unselected token, quantifying how much additional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

SCOPE: Saliency-Coverage Oriented Token Pruning for Efficient Multimodel LLMs· slideslive