Don't Pause! Every prediction matters in a streaming video

Dibyadip Chatterjee; Zhanzhong Pang; Fadime Sener; Yale Song; Angela Yao

arXiv:2604.24317·cs.CV·April 28, 2026

Don't Pause! Every prediction matters in a streaming video

Dibyadip Chatterjee, Zhanzhong Pang, Fadime Sener, Yale Song, Angela Yao

PDF

TL;DR

This paper introduces SPOT-Bench, a new streaming VideoQA benchmark with a novel metric, and proposes AsynKV, a training-free model that improves real-time streaming perception and response behavior.

Contribution

The paper presents SPOT-Bench for evaluating streaming video models and introduces AsynKV, a new approach that enhances streaming perception without additional training.

Findings

01

Offline models detect events reliably but tend to spam predictions.

02

Silence training reduces spamming but causes unresponsiveness.

03

Half of the video segments require no response, called dead-time.

Abstract

Streaming video models should respond the moment an event unfolds, not after the moment has passed. Yet existing online VideoQA benchmarks remain largely retrospective. They pause the video at fixed timestamps, pose questions about current or past events, and score models only at those moments. This protocol leaves streaming predictions untested. To close this gap, we introduce SPOT-Bench, featuring multi-turn proactive queries that evaluate general streaming perception and assistive capabilities required by an always-on, real-time assistant. SPOT-Bench comes with Timeliness-F1, a consolidated metric that measures streaming predictions by their temporal precision and balanced coverage across the entire video. Our benchmark reveals: (i) offline models detect events reliably but spam predictions unprompted; (ii) post-training for silence reduces spamming but induces unresponsiveness;…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.