AdaSpark: Adaptive Sparsity for Efficient Long-Video Understanding

Handong Li; Zikang Liu; Longteng Guo; Tongtian Yue; Yepeng Tang; Xinxin Zhu; Chuanyang Zheng; Ziming Wang; Zhibin Wang; Jun Song; Cheng Yu; Bo Zheng; Jing Liu

arXiv:2604.08077·cs.CV·May 11, 2026

AdaSpark: Adaptive Sparsity for Efficient Long-Video Understanding

Handong Li, Zikang Liu, Longteng Guo, Tongtian Yue, Yepeng Tang, Xinxin Zhu, Chuanyang Zheng, Ziming Wang, Zhibin Wang, Jun Song, Cheng Yu, Bo Zheng, Jing Liu

PDF

TL;DR

AdaSpark introduces an adaptive sparsity framework for long-video understanding that reduces computational costs by up to 57% while maintaining performance and long-range dependencies.

Contribution

It proposes a novel adaptive sparsity method with context-aware components for efficient long-video processing in Video-LLMs.

Findings

01

Reduces FLOPs by up to 57% without performance loss.

02

Maintains fine-grained perception and long-range dependencies.

03

Validates effectiveness on hour-scale video benchmarks.

Abstract

Processing long-form videos with Video Large Language Models (Video-LLMs) is computationally prohibitive. Current efficiency methods often compromise fine-grained perception through irreversible information disposal or inhibit long-range temporal modeling via rigid, predefined sparse patterns. This paper introduces AdaSpark, an adaptive sparsity framework designed to address these limitations. AdaSpark first partitions video inputs into 3D spatio-temporal cubes. It then employs two co-designed, context-aware components: (1) Adaptive Cube-Selective Attention (AdaS-Attn), which adaptively selects a subset of relevant video cubes to attend for each query token, and (2) Adaptive Token-Selective FFN (AdaS-FFN), which selectively processes only the most salient tokens within each cube. An entropy-based (Top-p) selection mechanism adaptively allocates computational resources based on input…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.