A Simple Baseline for Streaming Video Understanding

Yujiao Shen; Shulin Tian; Jingkang Yang; Ziwei Liu

arXiv:2604.02317·cs.CV·April 3, 2026

A Simple Baseline for Streaming Video Understanding

Yujiao Shen, Shulin Tian, Jingkang Yang, Ziwei Liu

PDF

1 Repo

TL;DR

A simple sliding-window baseline using recent frames with off-the-shelf VLMs matches or exceeds complex streaming video models, challenging the need for intricate memory mechanisms.

Contribution

Demonstrates that a straightforward recent-frame approach can outperform complex memory-based streaming video models, urging a reevaluation of current benchmarking practices.

Findings

01

SimpleStream achieves 67.7% accuracy on OVO-Bench with only 4 frames.

02

Longer context benefits depend on the backbone model, not always improving performance.

03

Adding historical context can improve recall but may weaken real-time perception.

Abstract

Recent streaming video understanding methods increasingly rely on complex memory mechanisms to handle long video streams. We challenge this trend with a simple finding: a sliding-window baseline that feeds only the most recent N frames to an off-the-shelf VLM already matches or surpasses published streaming models. We formalize this baseline as SimpleStream and evaluate it against 13 major offline and online video LLM baselines on OVO-Bench and StreamingBench. Despite its simplicity, SimpleStream delivers consistently strong performance. With only 4 recent frames, it reaches 67.7% average accuracy on OVO-Bench and 80.59% on StreamingBench. Controlled ablations further show that the value of longer context is backbone-dependent rather than uniformly increasing with model scale, and reveal a consistent perception-memory trade-off: adding more historical context can improve recall, but…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

evolvinglmms-lab/SimpleStream
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.