Progressive Online Video Understanding with Evidence-Aligned Timing and Transparent Decisions

Kecheng Zhang; Zongxin Yang; Mingfei Han; Haihong Hao; Yunzhi Zhuge; Changlin Li; Junhan Zhao; Zhihui Li; Xiaojun Chang

arXiv:2604.18459·cs.CV·April 21, 2026

Progressive Online Video Understanding with Evidence-Aligned Timing and Transparent Decisions

Kecheng Zhang, Zongxin Yang, Mingfei Han, Haihong Hao, Yunzhi Zhuge, Changlin Li, Junhan Zhao, Zhihui Li, Xiaojun Chang

PDF

1 Video

TL;DR

This paper introduces a novel online video understanding framework with transparent reasoning and evidence-aligned response timing, addressing challenges of real-time analysis and decision transparency.

Contribution

It proposes extbf{ extsc{EvidenceAlign}}, a framework with a transparent reasoning controller and hierarchical memory system for evidence-aligned, online video understanding.

Findings

01

Achieved 71.6% on StreamingBench, surpassing previous state-of-the-art.

02

Demonstrated precise response timing matching evidence appearance in videos.

03

Showed improved accuracy from 67.63% to 71.60% on StreamingBench with extsc{Thinking-QwenVL}.

Abstract

Visual agents operating in the wild must respond to queries precisely when sufficient evidence first appears in a video stream, a critical capability that is overlooked by conventional video LLMs evaluated in offline settings. The shift to an online, streaming paradigm introduces significant challenges: a lack of decision transparency, the difficulty of aligning response timing with visual evidence, and the need to maintain a global, causally consistent understanding under tight computational budgets. To address these issues, we propose a novel framework that decouples reasoning control from memory integration. We introduce \textbf{\model{}}, an instantiation of this framework with two core components. First, the \emph{Active Thinking Decision Maker (ATDM)} is a transparent reasoning controller that externalizes its decision process using observable progress ( $ρ$ ) and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Progressive Online Video Understanding with Evidence-Aligned Timing and Transparent Decisions· slideslive