ERASE: Eliminating Redundant Visual Tokens via Adaptive Two-Stage Token Pruning

Yuna Lee; Kyoungho Min; Yulhwa Kim

arXiv:2605.09982·cs.CV·May 12, 2026

ERASE: Eliminating Redundant Visual Tokens via Adaptive Two-Stage Token Pruning

Yuna Lee, Kyoungho Min, Yulhwa Kim

PDF

1 Repo

TL;DR

ERASE is a two-stage adaptive vision token pruning framework that significantly reduces tokens in vision-language models while maintaining high accuracy, improving efficiency in multimodal understanding.

Contribution

It introduces an adaptive, two-stage token pruning method that better captures visual redundancy based on image complexity, outperforming prior approaches.

Findings

01

At 85% token pruning, ERASE retains 89.46% of accuracy on Qwen2.5-VL-7B.

02

ERASE outperforms previous methods, which retain only 78.1% accuracy at the same pruning ratio.

03

The framework effectively balances token reduction and model performance.

Abstract

Recent advancements in Vision-Language Models (VLMs) enable large language models (LLMs) to process high-resolution images, significantly improving real-world multimodal understanding. However, this capability introduces a large number of vision tokens, resulting in substantial computational overhead. To mitigate this issue, various vision token pruning methods have been proposed. Nevertheless, existing approaches predominantly rely on learned semantic features within the model to capture visual redundancy. Moreover, they lack adaptive mechanisms to adjust pruning strategies according to the complexity of the input image. In this paper, we propose ERASE, a two-stage vision token pruning framework that identifies and retains salient tokens through pruning strategies adaptive to image complexity. Experiment results demonstrate that ERASE significantly reduces vision tokens while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Tuna-Luna/ERASE
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.