Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding

Xiaoyi Zhang; Zhaoyang Jia; Zongyu Guo; Jiahao Li; Bin Li; Houqiang Li; Yan Lu

arXiv:2505.18079·cs.CV·November 4, 2025

Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding

Xiaoyi Zhang, Zhaoyang Jia, Zongyu Guo, Jiahao Li, Bin Li, Houqiang Li, Yan Lu

PDF

1 Video

TL;DR

This paper introduces Deep Video Discovery, an agentic search framework utilizing LLMs and adaptive tool use to improve understanding of long-form videos, achieving state-of-the-art results on benchmark datasets.

Contribution

The paper presents a novel agentic search approach that adaptively orchestrates tools for long video understanding, surpassing previous methods in accuracy.

Findings

01

Achieves 74.2% accuracy on LVBench dataset.

02

Improves to 76.0% accuracy with transcripts.

03

Demonstrates superior performance over prior methods.

Abstract

Long-form video understanding presents significant challenges due to extensive temporal-spatial complexity and the difficulty of question answering under such extended contexts. While Large Language Models (LLMs) have demonstrated considerable advancements in video analysis capabilities and long context handling, they continue to exhibit limitations when processing information-dense hour-long videos. To overcome such limitations, we propose the Deep Video Discovery (DVD) agent to leverage an agentic search strategy over segmented video clips. Unlike previous video agents that rely on predefined workflows applied uniformly across different queries, our approach emphasizes the autonomous and adaptive nature of agents. By providing a set of search-centric tools on multi-granular video database, our DVD agent leverages the advanced reasoning capability of LLM to plan on its current…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding· slideslive

Taxonomy

MethodsSparse Evolutionary Training