Video Evidence to Reasoning Efficient Video Understanding via Explicit Evidence Grounding
Yanxiang Huang, Guohua Gao, Zhaoyang Wei, Jianyuan Ni

TL;DR
This paper introduces the Chain of Evidence framework for efficient and reliable video reasoning, combining a lightweight evidence grounding module and reinforcement learning to improve accuracy and reduce hallucinations in large vision-language models.
Contribution
The paper proposes a novel CoE framework with a dynamic evidence grounding module and an RL-based anchoring protocol, advancing video reasoning accuracy and reliability.
Findings
Achieves state-of-the-art results on five benchmarks.
Significantly reduces hallucinations in video reasoning.
Demonstrates effective evidence grounding in large-scale datasets.
Abstract
Large Vision-Language Models (LVLMs) face a fundamental dilemma in video reasoning: they are caught between the prohibitive computational costs of verbose reasoning and the hallucination risks of efficient, ungrounded approaches. To resolve this, we introduce the Chain of Evidence (CoE), a novel framework that architecturally decouples and co-optimizes perceptual grounding and reasoning efficiency. CoE incorporates two core innovations: (1) A lightweight Evidence Grounding Module (EGM) that acts as a query-guided filter, dynamically identifying and extracting a compact set of high-fidelity visual evidence; and (2) An Evidence-Anchoring Protocol optimized via Reinforcement Learning. Crucially, we design a composite reward mechanism that enforces process alignment, compelling the model to strictly reference identified temporal anchors during deduction, thereby mitigating hallucinations.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Generative Adversarial Networks and Image Synthesis
