StreamPro: From Reactive Perception to Proactive Decision-Making in Streaming Video
Ao Li, Zihan Xiao, Zihao Yue, Boshen Xu, Linli Yao, Jiaze Li, Pei Fu, Jianzhong Ju, Jian Luan, Qin Jin

TL;DR
StreamPro introduces a new benchmark and training framework for proactive streaming video understanding, enabling models to make early, reliable decisions under partial observations.
Contribution
The paper presents StreamPro-Bench for evaluating proactive reasoning and proposes StreamPro, a two-stage training method with novel loss and reward designs.
Findings
StreamPro significantly outperforms previous methods on the new benchmark.
It achieves 41.5 in proactive performance, compared to 10.4 of the previous best.
On StreamingBench-RTVU, it reaches 78.9, demonstrating strong real-time streaming capabilities.
Abstract
Proactive streaming video understanding requires models to continuously process video streams and decide when to respond, rather than merely what to respond. This naturally introduces a decision-making problem under partial observations, where models must balance early prediction against sufficient evidence. However, existing benchmarks largely follow a "see-then-answer" paradigm, where responses are triggered only after explicit evidence appears, effectively reducing proactive reasoning to delayed perception. As a result, they fail to evaluate a model's ability to make timely and reliable decisions under incomplete observations. Moreover, training proactive models is inherently challenging due to the extreme imbalance between silence and response signals in streaming trajectories, as well as the need to jointly optimize response correctness and timing. To address these challenges, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
