OmniSelect: Dynamic Modality-Aware Token Compression for Efficient Omni-modal Large Language Models

Morunliu Yang,Ruotao Xu,Le Li,Yue Wang,Jianxin Zhang,Juntao Li,Yihang Lou,Siwei Feng,Peifeng Li

arXiv:2605.18041·cs.CV·May 19, 2026

OmniSelect: Dynamic Modality-Aware Token Compression for Efficient Omni-modal Large Language Models

Morunliu Yang,Ruotao Xu,Le Li,Yue Wang,Jianxin Zhang,Juntao Li,Yihang Lou,Siwei Feng,Peifeng Li

PDF

TL;DR

OmniSelect introduces a training-free, modality-adaptive token pruning framework for OmniLLMs, dynamically selecting compression strategies based on cross-modal relevance to improve efficiency without sacrificing performance.

Contribution

It presents a novel, training-free, dynamic token pruning method that adapts to modality importance in multimodal inputs, enhancing efficiency in OmniLLMs.

Findings

01

Achieves significant token reduction while maintaining performance.

02

Effectively models modality preferences for dynamic token pruning.

03

No additional training required for the pruning framework.

Abstract

Omnimodal large language models (OmniLLMs) have recently gained increasing attention for unified audio-video understanding. However, processing long multimodal token sequences introduces substantial computational overhead, making efficient token compression crucial. Existing methods typically rely on fixed, modality-specific guidance, which fails to account for the varying importance of modalities across different queries. To address this limitation, we propose $OmniSelect$ , a training-free, modality-adaptive token pruning framework that dynamically selects appropriate compression strategies for multimodal inputs. Specifically, we leverage a lightweight AudioCLIP model to estimate cross-modal relevance and categorize each input into three pruning regimes: Audio-Centric, Video-Centric, and Uniform pruning. Based on these relevance scores, OmniSelect further performs fine-grained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.