Streamline Without Sacrifice -- Squeeze out Computation Redundancy in LMM

Penghao Wu; Lewei Lu; Ziwei Liu

arXiv:2505.15816·cs.CV·May 22, 2025

Streamline Without Sacrifice -- Squeeze out Computation Redundancy in LMM

Penghao Wu, Lewei Lu, Ziwei Liu

PDF

Open Access 1 Repo

TL;DR

This paper introduces ProxyV, a method to reduce computation in large multimodal models by processing vision tokens more lightly, achieving efficiency gains without performance loss.

Contribution

The paper identifies computation-level redundancy in vision tokens and proposes ProxyV, a novel approach that uses proxy vision tokens to reduce computational load in LMMs.

Findings

01

ProxyV reduces visual token computation without performance loss

02

Combining ProxyV with token reduction methods further improves efficiency

03

ProxyV can lead to performance gains in certain scenarios

Abstract

Large multimodal models excel in multimodal tasks but face significant computational challenges due to excessive computation on visual tokens. Unlike token reduction methods that focus on token-level redundancy, we identify and study the computation-level redundancy on vision tokens to ensure no information loss. Our key insight is that vision tokens from the pretrained vision encoder do not necessarily require all the heavy operations (e.g., self-attention, FFNs) in decoder-only LMMs and could be processed more lightly with proper designs. We designed a series of experiments to discover and progressively squeeze out the vision-related computation redundancy. Based on our findings, we propose ProxyV, a novel approach that utilizes proxy vision tokens to alleviate the computational burden on original vision tokens. ProxyV enhances efficiency without compromising performance and can even…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

penghao-wu/proxyv
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCryptography and Data Security · Parallel Computing and Optimization Techniques · Advanced Surface Polishing Techniques

MethodsFocus · Umbrella Reinforcement Learning