Video Evidence to Reasoning Efficient Video Understanding via Explicit Evidence Grounding

Yanxiang Huang; Guohua Gao; Zhaoyang Wei; Jianyuan Ni

arXiv:2601.07761·cs.CV·January 13, 2026

Video Evidence to Reasoning Efficient Video Understanding via Explicit Evidence Grounding

Yanxiang Huang, Guohua Gao, Zhaoyang Wei, Jianyuan Ni

PDF

Open Access

TL;DR

This paper introduces the Chain of Evidence framework for efficient and reliable video reasoning, combining a lightweight evidence grounding module and reinforcement learning to improve accuracy and reduce hallucinations in large vision-language models.

Contribution

The paper proposes a novel CoE framework with a dynamic evidence grounding module and an RL-based anchoring protocol, advancing video reasoning accuracy and reliability.

Findings

01

Achieves state-of-the-art results on five benchmarks.

02

Significantly reduces hallucinations in video reasoning.

03

Demonstrates effective evidence grounding in large-scale datasets.

Abstract

Large Vision-Language Models (LVLMs) face a fundamental dilemma in video reasoning: they are caught between the prohibitive computational costs of verbose reasoning and the hallucination risks of efficient, ungrounded approaches. To resolve this, we introduce the Chain of Evidence (CoE), a novel framework that architecturally decouples and co-optimizes perceptual grounding and reasoning efficiency. CoE incorporates two core innovations: (1) A lightweight Evidence Grounding Module (EGM) that acts as a query-guided filter, dynamically identifying and extracting a compact set of high-fidelity visual evidence; and (2) An Evidence-Anchoring Protocol optimized via Reinforcement Learning. Crucially, we design a composite reward mechanism that enforces process alignment, compelling the model to strictly reference identified temporal anchors during deduction, thereby mitigating hallucinations.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Generative Adversarial Networks and Image Synthesis