Why 1 + 1 < 1 in Visual Token Pruning: Beyond Naive Integration via Multi-Objective Balanced Covering
Yangfu Li, Hongjian Zhan, Tianyi Chen, Qi Liu, Yue Lu

TL;DR
This paper introduces a novel multi-objective approach for visual token pruning that optimally balances prompt alignment and visual preservation, leading to significant efficiency gains without sacrificing performance.
Contribution
It derives a closed-form error bound for token pruning, reveals the trade-off between objectives, and proposes MoB, a scalable bi-objective covering method with provable performance guarantees.
Findings
Preserves 96.4% of performance with only 11.1% tokens
Accelerates models by 1.3-1.5× with negligible loss
Effective across multiple vision-language tasks
Abstract
Existing visual token pruning methods target prompt alignment and visual preservation with static strategies, overlooking the varying relative importance of these objectives across tasks, which leads to inconsistent performance. To address this, we derive the first closed-form error bound for visual token pruning based on the Hausdorff distance, uniformly characterizing the contributions of both objectives. Moreover, leveraging -covering theory, we reveal an intrinsic trade-off between these objectives and quantify their optimal attainment levels under a fixed budget. To practically handle this trade-off, we propose Multi-Objective Balanced Covering (MoB), which reformulates visual token pruning as a bi-objective covering problem. In this framework, the attainment trade-off reduces to budget allocation via greedy radius trading. MoB offers a provable performance bound and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Computational Geometry and Mesh Generation
MethodsPruning
