UniCompress: Token Compression for Unified Vision-Language Understanding and Generation

Ziyao Wang; Chen Chen; Jingtao Li; Weiming Zhuang; Jiabo Huang; Ang Li; Lingjuan Lyu

arXiv:2603.11320·cs.CV·March 13, 2026

UniCompress: Token Compression for Unified Vision-Language Understanding and Generation

Ziyao Wang, Chen Chen, Jingtao Li, Weiming Zhuang, Jiabo Huang, Ang Li, Lingjuan Lyu

PDF

Open Access

TL;DR

UniCompress is a lightweight, modular token compression method that reduces visual tokens in unified vision-language models, significantly improving efficiency with minimal performance loss.

Contribution

It introduces a novel plug-in compression mechanism guided by learnable meta tokens, enabling efficient token reduction without full model retraining.

Findings

01

Reduces visual tokens by up to 4 times

02

Improves inference latency and training cost

03

Maintains performance with minimal degradation

Abstract

Unified models aim to support both understanding and generation by encoding images into discrete tokens and processing them alongside text within a single autoregressive framework. This unified design offers architectural simplicity and cross-modal synergy, which facilitates shared parameterization, consistent training objectives, and seamless transfer between modalities. However, the large number of visual tokens required by such models introduces substantial computation and memory overhead, and this inefficiency directly hinders deployment in resource constrained scenarios such as embodied AI systems. In this work, we propose a unified token compression algorithm UniCompress that significantly reduces visual token count while preserving performance on both image understanding and generation tasks. Our method introduces a plug-in compression and decompression mechanism guided with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning