DynFocus: Dynamic Cooperative Network Empowers LLMs with Video   Understanding

Yudong Han; Qingpei Guo; Liyuan Pan; Liu Liu; Yu Guan; Ming Yang

arXiv:2411.12355·cs.CV·March 26, 2025

DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding

Yudong Han, Qingpei Guo, Liyuan Pan, Liu Liu, Yu Guan, Ming Yang

PDF

Open Access

TL;DR

DynFocus introduces a dynamic encoding framework for LLM-based video understanding, effectively balancing detailed information preservation with memory efficiency by selectively encoding frames based on relevance.

Contribution

The paper proposes DynFocus, a novel dynamic cooperative network with modules for adaptive frame selection and encoding, improving memory efficiency in video question answering.

Findings

01

Achieves competitive performance on five benchmarks.

02

Effectively reduces token usage while maintaining accuracy.

03

Demonstrates the benefit of dynamic encoding in video understanding.

Abstract

The challenge in LLM-based video understanding lies in preserving visual and semantic information in long videos while maintaining a memory-affordable token count. However, redundancy and correspondence in videos have hindered the performance potential of existing methods. Through statistical learning on current datasets, we observe that redundancy occurs in both repeated and answer-irrelevant frames, and the corresponding frames vary with different questions. This suggests the possibility of adopting dynamic encoding to balance detailed video information preservation with token budget reduction. To this end, we propose a dynamic cooperative network, DynFocus, for memory-efficient video encoding in this paper. Specifically, i) a Dynamic Event Prototype Estimation (DPE) module to dynamically select meaningful frames for question answering; (ii) a Compact Cooperative Encoding (CCE) module…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Scientific Computing and Data Management