Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models

Xuyang Liu; Ziming Wang; Junjie Chen; Yuhang Han; Yingyao Wang; Jiale Yuan; Jun Song; Siteng Huang; Honggang Chen

arXiv:2501.05179·cs.CV·January 14, 2026

Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models

Xuyang Liu, Ziming Wang, Junjie Chen, Yuhang Han, Yingyao Wang, Jiale Yuan, Jun Song, Siteng Huang, Honggang Chen

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces GlobalCom2, a plug-and-play token compression framework for high-resolution vision-language models that uses global thumbnails to guide efficient local crop processing, significantly reducing computation while maintaining performance.

Contribution

It proposes a novel global compression framework leveraging thumbnails to guide token compression in high-resolution LVLMs, addressing multi-view and dynamic cropping challenges.

Findings

01

Maintains over 90% performance with 90% token compression.

02

Reduces FLOPs to 9.1% and peak memory to 60%.

03

Demonstrates effectiveness on high-resolution multi-view models.

Abstract

Large vision-language models (LVLMs) excel at visual understanding, but face efficiency challenges due to quadratic complexity in processing long multi-modal contexts. While token compression can reduce computational costs, existing approaches are designed for single-view LVLMs and fail to consider the unique multi-view characteristics of high-resolution LVLMs with dynamic cropping. Existing methods treat all tokens uniformly, but our analysis reveals that global thumbnails can naturally guide the compression of local crops by providing holistic context for informativeness evaluation. In this paper, we first analyze dynamic cropping strategy, revealing both the complementary nature between thumbnails and crops, and the distinctive characteristics across different crops. Based on our observations, we propose ``Global Compression Commander'' (\textit{i.e.}, \textbf{GlobalCom $^{2}$ }), a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xuyang-liu16/globalcom2
pytorchOfficial

Videos

Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models· underline

Taxonomy

TopicsMagnetic confinement fusion research · Algorithms and Data Compression

MethodsSoftmax · Attention Is All You Need