VFlowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization
Sihan Yang, Runsen Xu, Chenhang Cui, Tai Wang, Dahua Lin, Jiangmiao Pang

TL;DR
VFlowOpt is a novel token pruning framework for large multimodal models that significantly reduces computational costs by intelligently pruning visual tokens while preserving model performance through a visual information flow-guided optimization.
Contribution
It introduces a new importance map derivation and progressive pruning strategy with a recycling mechanism, optimized by a visual information flow-guided method, to improve token pruning in LMMs.
Findings
Prunes 90% of visual tokens with minimal performance loss.
Achieves 89% reduction in KV-Cache memory.
Enables 3.8 times faster inference.
Abstract
Large Multimodal Models (LMMs) excel in visual-language tasks by leveraging numerous visual tokens for fine-grained visual information, but this token redundancy results in significant computational costs. Previous research aimed at reducing visual tokens during inference typically leverages importance maps derived from attention scores among vision-only tokens or vision-language tokens to prune tokens across one or multiple pruning stages. Despite this progress, pruning frameworks and strategies remain simplistic and insufficiently explored, often resulting in substantial performance degradation. In this paper, we propose VFlowOpt, a token pruning framework that introduces an importance map derivation process and a progressive pruning module with a recycling mechanism. The hyperparameters of its pruning strategy are further optimized by a visual information flow-guided method.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
