Streamline Without Sacrifice -- Squeeze out Computation Redundancy in LMM
Penghao Wu, Lewei Lu, Ziwei Liu

TL;DR
This paper introduces ProxyV, a method to reduce computation in large multimodal models by processing vision tokens more lightly, achieving efficiency gains without performance loss.
Contribution
The paper identifies computation-level redundancy in vision tokens and proposes ProxyV, a novel approach that uses proxy vision tokens to reduce computational load in LMMs.
Findings
ProxyV reduces visual token computation without performance loss
Combining ProxyV with token reduction methods further improves efficiency
ProxyV can lead to performance gains in certain scenarios
Abstract
Large multimodal models excel in multimodal tasks but face significant computational challenges due to excessive computation on visual tokens. Unlike token reduction methods that focus on token-level redundancy, we identify and study the computation-level redundancy on vision tokens to ensure no information loss. Our key insight is that vision tokens from the pretrained vision encoder do not necessarily require all the heavy operations (e.g., self-attention, FFNs) in decoder-only LMMs and could be processed more lightly with proper designs. We designed a series of experiments to discover and progressively squeeze out the vision-related computation redundancy. Based on our findings, we propose ProxyV, a novel approach that utilizes proxy vision tokens to alleviate the computational burden on original vision tokens. ProxyV enhances efficiency without compromising performance and can even…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCryptography and Data Security · Parallel Computing and Optimization Techniques · Advanced Surface Polishing Techniques
MethodsFocus · Umbrella Reinforcement Learning
