FlowCut: Rethinking Redundancy via Information Flow for Efficient Vision-Language Models

Jintao Tong; Wenwei Jin; Pengda Qin; Anqi Li; Yixiong Zou; Yuhong Li; Yuhua Li; Ruixuan Li

arXiv:2505.19536·cs.CV·November 25, 2025

FlowCut: Rethinking Redundancy via Information Flow for Efficient Vision-Language Models

Jintao Tong, Wenwei Jin, Pengda Qin, Anqi Li, Yixiong Zou, Yuhong Li, Yuhua Li, Ruixuan Li

PDF

Open Access 1 Repo 2 Models

TL;DR

FlowCut introduces an information flow-based pruning method for vision-language models, effectively reducing redundant tokens and computational costs while maintaining or improving performance.

Contribution

The paper proposes FlowCut, a novel pruning framework based on information flow analysis, addressing limitations of single-layer attention scores in identifying redundancy.

Findings

01

Outperforms state-of-the-art by 1.6% on LLaVA-1.5-7B with 88.9% token reduction

02

Achieves 94.4% token reduction on LLaVA-NeXT-7B with 4.3% performance gain

03

Provides 3.2x speed-up in the pre-filling stage

Abstract

Large vision-language models (LVLMs) excel at multimodal understanding but suffer from high computational costs due to redundant vision tokens. Existing pruning methods typically rely on single-layer attention scores to rank and prune redundant visual tokens to solve this inefficiency. However, as the interaction between tokens and layers is complicated, this raises a basic question: Is such a simple single-layer criterion sufficient to identify redundancy? To answer this question, we rethink the emergence of redundant visual tokens from a fundamental perspective: information flow, which models the interaction between tokens and layers by capturing how information moves between tokens across layers. We find (1) the CLS token acts as an information relay, which can simplify the complicated flow analysis; (2) the redundancy emerges progressively and dynamically via layer-wise attention…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tungchintao/flowcut
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications

MethodsSoftmax · Attention Is All You Need · Pruning