StreamReady: Learning What to Answer and When in Long Streaming Videos
Shehreen Azad, Vibhav Vineet, Yogesh Singh Rawat

TL;DR
StreamReady introduces a timing-aware approach for streaming video understanding, enabling models to answer questions precisely when sufficient evidence appears, improving real-time accuracy and responsiveness.
Contribution
It proposes the ARS metric and the StreamReady framework, unifying temporal reasoning with on-time answering in streaming videos, along with a new benchmark ProReady-QA.
Findings
StreamReady outperforms prior methods on ProReady-QA.
ARS improves measurement of answer timing accuracy.
Framework generalizes across multiple long-video benchmarks.
Abstract
Streaming video understanding often involves time-sensitive scenarios where models need to answer exactly when the supporting visual evidence appears: answering before the evidence reflects speculation, answering after it has passed reduces real-time utility. To capture this behavior, we introduce a readiness-aware formulation of streaming video understanding with the Answer Readiness Score (ARS), a timing-aware objective with asymmetric early and late penalties. When combined with correctness, ARS defines an effective accuracy that measures not just whether a model is right, but whether it answers at the appropriate moment. Building on this formulation, we introduce StreamReady, a framework to unify temporal reasoning with on-time answering through a lightweight readiness mechanism that decides if sufficient evidence has been observed before responding. To evaluate this capability, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition
