LDDR: Linear-DPP-Based Dynamic-Resolution Frame Sampling for Video MLLMs

Jingfeng Chen; Jiawen Qian; Wendi Deng; Yinuo Guo; Jiaqi Yu; Sicong Leng; Raghuveer Thirukovalluru; Bhuwan Dhingra

arXiv:2605.11477·cs.CV·May 13, 2026

LDDR: Linear-DPP-Based Dynamic-Resolution Frame Sampling for Video MLLMs

Jingfeng Chen, Jiawen Qian, Wendi Deng, Yinuo Guo, Jiaqi Yu, Sicong Leng, Raghuveer Thirukovalluru, Bhuwan Dhingra

PDF

TL;DR

LDDR is a novel, training-free frame sampling method for video understanding in multimodal large language models, improving efficiency and accuracy by selecting informative frames under budget constraints.

Contribution

It introduces a query-aware DPP-based sampling framework with dynamic resolution allocation, outperforming existing methods across multiple benchmarks.

Findings

01

LDDR achieves 3x runtime speedup over standard DPP methods.

02

It outperforms baselines by 2.5 points under budget constraints.

03

It improves video understanding across various MLLM backbones.

Abstract

Video understanding in multimodal large language models requires selecting informative frames from long, redundant videos under limited visual-token budgets. Existing methods often rely on uniform sampling, point-wise relevance scoring, chunk-wise selection, or agentic exploration, which either miss global dependencies or introduce substantial overhead. We propose LDDR (Linear DPP-Based Dynamic Resolution), a training-free, plug-and-play, and budget-aware video frame sampling framework. LDDR performs query-aware Determinantal Point Process (DPP) frame selection in a task-conditioned feature space, achieving a 3x runtime speedup over standard DPP baselines. It further introduces a Group DPP importance metric to guide frame retention and dynamic resolution allocation, assigning more tokens to informative, non-redundant frames while downscaling or pruning less useful ones. Across four…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.