OmniSparse: Training-Aware Fine-Grained Sparse Attention for Long-Video MLLMs

Feng Chen; Yefei He; Shaoxuan He; Yuanyu He; Jing Liu; Lequan Lin; Akide Liu; Zhaoyang Li; Jiyuan Zhang; Zhenbang Sun; Bohan Zhuang; Qi Wu

arXiv:2511.12201·cs.CV·November 20, 2025

OmniSparse: Training-Aware Fine-Grained Sparse Attention for Long-Video MLLMs

Feng Chen, Yefei He, Shaoxuan He, Yuanyu He, Jing Liu, Lequan Lin, Akide Liu, Zhaoyang Li, Jiyuan Zhang, Zhenbang Sun, Bohan Zhuang, Qi Wu

PDF

Open Access 1 Video

TL;DR

OmniSparse introduces a training-aware, fine-grained sparse attention method for long-video multimodal language models, enabling dynamic token selection and significant speed and memory improvements while maintaining full attention performance.

Contribution

It proposes a novel framework with adaptive mechanisms for token and head selection that operate during both training and inference, bridging the training-inference gap in sparse attention methods.

Findings

01

Achieves up to 2.7x speedup during prefill

02

Reduces memory usage by 2.4x during decoding

03

Maintains full attention performance

Abstract

Existing sparse attention methods primarily target inference-time acceleration by selecting critical tokens under predefined sparsity patterns. However, they often fail to bridge the training-inference gap and lack the capacity for fine-grained token selection across multiple dimensions such as queries, key-values (KV), and heads, leading to suboptimal performance and limited acceleration gains. In this paper, we introduce OmniSparse, a training-aware fine-grained sparse attention framework for long-video MLLMs, which operates in both training and inference with dynamic token budget allocation. Specifically, OmniSparse contains three adaptive and complementary mechanisms: (1) query selection via lazy-active classification, retaining active queries that capture broad semantic similarity while discarding most lazy ones that focus on limited local context and exhibit high functional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

OmniSparse: Training-Aware Fine-Grained Sparse Attention for Long-Video MLLMs· underline

Taxonomy

TopicsImage and Video Quality Assessment · Advanced Neural Network Applications · Visual Attention and Saliency Detection