AdaFocus: Adaptive Relevance-Diversity Sampling with Zero-Cache Look-back for Efficient Long Video Understanding

Xiao Yang; Yingzhe Ma; Haoxuan Yu; Zixin Li; Ning Qin

arXiv:2605.12954·cs.CV·May 14, 2026

AdaFocus: Adaptive Relevance-Diversity Sampling with Zero-Cache Look-back for Efficient Long Video Understanding

Xiao Yang, Yingzhe Ma, Haoxuan Yu, Zixin Li, Ning Qin

PDF

TL;DR

AdaFocus introduces a progressive evidence acquisition framework for long video understanding, combining adaptive sampling and zero-cache disk retrieval to improve efficiency and accuracy.

Contribution

It proposes a novel adaptive relevance-diversity sampler and an on-demand evidence refinement mechanism, enabling scalable long-video reasoning without exhaustive preloading.

Findings

01

Achieves +2.59 accuracy on VideoMME

02

Improves mIoU by +8.39 on Charades-STA

03

Reduces visual token consumption by ~33x

Abstract

Long video understanding is heavily bottlenecked by a rigid one-shot paradigm: existing methods either densely encode videos at prohibitive memory and latency costs, or aggressively compress them into sparse frame sets that irreversibly discard fine-grained evidence needed for downstream reasoning. Consequently, current models struggle to simultaneously balance temporal coverage, visual details, and computational efficiency. We propose AdaFocus, an efficient framework that rethinks long-video understanding as progressive evidence acquisition rather than one-pass encoding. AdaFocus relies on two tightly coupled components. First, a Query-Aware Adaptive Relevance-Diversity sampler (AdaRD) produces a compact yet informative video preview, adaptively switching to global clustering when the query lacks reliable local grounding. Second, instead of caching exhaustive frame sequences in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.