OTPrune: Distribution-Aligned Visual Token Pruning via Optimal Transport

Xiwen Chen; Wenhui Zhu; Gen Li; Xuanzhao Dong; Yujian Xiong; Hao Wang; Peijie Qiu; Qingquan Song; Zhipeng Wang; Shao Tang; Yalin Wang; and Abolfazl Razi

arXiv:2602.20205·cs.CV·April 2, 2026

OTPrune: Distribution-Aligned Visual Token Pruning via Optimal Transport

Xiwen Chen, Wenhui Zhu, Gen Li, Xuanzhao Dong, Yujian Xiong, Hao Wang, Peijie Qiu, Qingquan Song, Zhipeng Wang, Shao Tang, Yalin Wang, and Abolfazl Razi

PDF

1 Repo

TL;DR

OTPrune is a novel, training-free visual token pruning method for multi-modal large language models that uses optimal transport to align token distributions, reducing inference costs while maintaining performance.

Contribution

It introduces a distribution alignment framework via optimal transport for token pruning, with a tractable submodular objective and theoretical guarantees, improving efficiency and stability.

Findings

01

OTPrune outperforms state-of-the-art methods in efficiency-performance tradeoffs.

02

The method preserves local diversity and global representativeness of visual tokens.

03

Theoretical analysis confirms the stability and semantic faithfulness of the pruning process.

Abstract

Multi-modal large language models (MLLMs) achieve strong visual-language reasoning but suffer from high inference cost due to redundant visual tokens. Recent work explores visual token pruning to accelerate inference, while existing pruning methods overlook the underlying distributional structure of visual representations. We propose OTPrune, a training-free framework that formulates pruning as distribution alignment via optimal transport (OT). By minimizing the 2-Wasserstein distance between the full and pruned token distributions, OTPrune preserves both local diversity and global representativeness while reducing inference cost. Moreover, we derive a tractable submodular objective that enables efficient optimization, and theoretically prove its monotonicity and submodularity, providing a principled foundation for stable and efficient pruning. We further provide a comprehensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xiwenc1/OTPrune
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.