VFlowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization

Sihan Yang; Runsen Xu; Chenhang Cui; Tai Wang; Dahua Lin; Jiangmiao Pang

arXiv:2508.05211·cs.CV·September 12, 2025

VFlowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization

Sihan Yang, Runsen Xu, Chenhang Cui, Tai Wang, Dahua Lin, Jiangmiao Pang

PDF

TL;DR

VFlowOpt is a novel token pruning framework for large multimodal models that significantly reduces computational costs by intelligently pruning visual tokens while preserving model performance through a visual information flow-guided optimization.

Contribution

It introduces a new importance map derivation and progressive pruning strategy with a recycling mechanism, optimized by a visual information flow-guided method, to improve token pruning in LMMs.

Findings

01

Prunes 90% of visual tokens with minimal performance loss.

02

Achieves 89% reduction in KV-Cache memory.

03

Enables 3.8 times faster inference.

Abstract

Large Multimodal Models (LMMs) excel in visual-language tasks by leveraging numerous visual tokens for fine-grained visual information, but this token redundancy results in significant computational costs. Previous research aimed at reducing visual tokens during inference typically leverages importance maps derived from attention scores among vision-only tokens or vision-language tokens to prune tokens across one or multiple pruning stages. Despite this progress, pruning frameworks and strategies remain simplistic and insufficiently explored, often resulting in substantial performance degradation. In this paper, we propose VFlowOpt, a token pruning framework that introduces an importance map derivation process and a progressive pruning module with a recycling mechanism. The hyperparameters of its pruning strategy are further optimized by a visual information flow-guided method.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.