OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models

Keda Tao; Kele Shao; Bohan Yu; Weiqiang Wang; Jian liu; Huan Wang

arXiv:2511.14582·cs.CV·April 21, 2026

OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models

Keda Tao, Kele Shao, Bohan Yu, Weiqiang Wang, Jian liu, Huan Wang

PDF

1 Repo

TL;DR

OmniZip is a training-free, audio-guided token compression framework that accelerates omni-modal large language models by dynamically pruning video tokens based on salient audio cues, achieving significant speed and memory improvements.

Contribution

It introduces a novel, training-free method for joint audio-visual token compression that enhances inference speed and reduces memory without performance loss.

Findings

01

Achieves 3.42X inference speedup

02

Reduces memory usage by 1.4X

03

Maintains model performance without additional training

Abstract

Omnimodal large language models (OmniLLMs) have attracted increasing research attention of late towards unified audio-video understanding. However, the high computational cost of processing longer joint audio-video token sequences has become a key bottleneck. Existing token compression methods have not addressed the emerging need to jointly compress multimodal tokens. To bridge this gap, we present OmniZip, a training-free, audio-guided audio-visual token-compression framework that optimizes multimodal token representation and accelerates model inference. Specifically, OmniZip first identifies salient audio tokens, then computes an audio retention score for each time group to capture information density, thereby dynamically guiding video token pruning and preserving cues from audio anchors enhanced by cross-modal similarity. For each time window, OmniZip compresses the video tokens…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kd-tao/OmniZip
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.