Think, Then Verify: A Hypothesis-Verification Multi-Agent Framework for Long Video Understanding
Zheng Wang, Haoran Chen, Haoxuan Qin, Zhipeng Wei, Tianwen Qian, Cong Bai

TL;DR
This paper introduces VideoHV-Agent, a hypothesis-verification framework for long video question answering that improves accuracy, interpretability, and efficiency by emphasizing deliberate task formulation before evidence retrieval.
Contribution
The paper proposes a novel structured hypothesis-verification approach for long video understanding, emphasizing reasoning before retrieval to reduce errors and improve interpretability.
Findings
Achieves state-of-the-art accuracy on three benchmarks.
Enhances interpretability and logical soundness.
Reduces computational cost compared to previous methods.
Abstract
Long video understanding is challenging due to dense visual redundancy, long-range temporal dependencies, and the tendency of chain-of-thought and retrieval-based agents to accumulate semantic drift and correlation-driven errors. We argue that long-video reasoning should begin not with reactive retrieval, but with deliberate task formulation: the model must first articulate what must be true in the video for each candidate answer to hold. This thinking-before-finding principle motivates VideoHV-Agent, a framework that reformulates video question answering as a structured hypothesis-verification process. Based on video summaries, a Thinker rewrites answer candidates into testable hypotheses, a Judge derives a discriminative clue specifying what evidence must be checked, a Verifier grounds and tests the clue using localized, fine-grained video content, and an Answer agent integrates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Topic Modeling
