STRIDE: When to Speak Meets Sequence Denoising for Streaming Video Understanding

Junho Kim; Hosu Lee; James M. Rehg; Minsu Kim; Yong Man Ro

arXiv:2603.27593·cs.CV·March 31, 2026

STRIDE: When to Speak Meets Sequence Denoising for Streaming Video Understanding

Junho Kim, Hosu Lee, James M. Rehg, Minsu Kim, Yong Man Ro

PDF

1 Repo 1 Models

TL;DR

STRIDE introduces a span-structured sequence modeling approach with diffusion refinement to improve proactive response timing in streaming video understanding.

Contribution

It proposes a novel span-level sequence modeling method with diffusion-based refinement for proactive streaming video perception.

Findings

01

STRIDE achieves more reliable proactive responses in streaming benchmarks.

02

It significantly improves when-to-speak decision quality.

03

The method enhances temporal coherence in online video understanding.

Abstract

Recent progress in video large language models (Video-LLMs) has enabled strong offline reasoning over long and complex videos. However, real-world deployments increasingly require streaming perception and proactive interaction, where video frames arrive online and the system must decide not only what to respond, but also when to respond. In this work, we revisit proactive activation in streaming video as a structured sequence modeling problem, motivated by the observation that temporal transitions in streaming video naturally form span-structured activation patterns. To capture this span-level structure, we model activation signals jointly over a sliding temporal window and update them iteratively as new frames arrive. We propose STRIDE (Structured Temporal Refinement with Iterative DEnoising), which employs a lightweight masked diffusion module at the activation interface to jointly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

interlive-team/STRIDE
github

Models

🤗
interlive/STRIDE-2B
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.