Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models
Xuyang Liu, Ziming Wang, Junjie Chen, Yuhang Han, Yingyao Wang, Jiale Yuan, Jun Song, Siteng Huang, Honggang Chen

TL;DR
This paper introduces GlobalCom2, a plug-and-play token compression framework for high-resolution vision-language models that uses global thumbnails to guide efficient local crop processing, significantly reducing computation while maintaining performance.
Contribution
It proposes a novel global compression framework leveraging thumbnails to guide token compression in high-resolution LVLMs, addressing multi-view and dynamic cropping challenges.
Findings
Maintains over 90% performance with 90% token compression.
Reduces FLOPs to 9.1% and peak memory to 60%.
Demonstrates effectiveness on high-resolution multi-view models.
Abstract
Large vision-language models (LVLMs) excel at visual understanding, but face efficiency challenges due to quadratic complexity in processing long multi-modal contexts. While token compression can reduce computational costs, existing approaches are designed for single-view LVLMs and fail to consider the unique multi-view characteristics of high-resolution LVLMs with dynamic cropping. Existing methods treat all tokens uniformly, but our analysis reveals that global thumbnails can naturally guide the compression of local crops by providing holistic context for informativeness evaluation. In this paper, we first analyze dynamic cropping strategy, revealing both the complementary nature between thumbnails and crops, and the distinctive characteristics across different crops. Based on our observations, we propose ``Global Compression Commander'' (\textit{i.e.}, \textbf{GlobalCom}), a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMagnetic confinement fusion research · Algorithms and Data Compression
MethodsSoftmax · Attention Is All You Need
