StreamingEval: A Unified Evaluation Protocol towards Realistic Streaming Video Understanding

Guowei Tang; Tianwen Qian; Huanran Zheng; Yifei Wang; Xiaoling Wang

arXiv:2603.21493·cs.CV·March 24, 2026

StreamingEval: A Unified Evaluation Protocol towards Realistic Streaming Video Understanding

Guowei Tang, Tianwen Qian, Huanran Zheng, Yifei Wang, Xiaoling Wang

PDF

Open Access

TL;DR

StreamingEval introduces a comprehensive evaluation framework for assessing the performance and deployability of streaming video understanding models under realistic resource constraints, highlighting current gaps and guiding future research.

Contribution

It presents a unified protocol for benchmarking streaming Video-LLMs, considering efficiency, storage, and accuracy in a standardized manner.

Findings

01

Current models lag behind real-world streaming requirements.

02

Significant trade-offs exist between efficiency and accuracy.

03

Benchmarking reveals gaps in deployability of existing Video-LLMs.

Abstract

Real-time, continuous understanding of visual signals is essential for real-world interactive AI applications, and poses a fundamental system-level challenge. Existing research on streaming video understanding, however, typically focuses on isolated aspects such as question-answering accuracy under limited visual context or improvements in encoding efficiency, while largely overlooking practical deployability under realistic resource constraints. To bridge this gap, we introduce StreamingEval, a unified evaluation framework for assessing the streaming video understanding capabilities of Video-LLMs under realistic constraints. StreamingEval benchmarks both mainstream offline models and recent online video models under a standardized protocol, explicitly characterizing the trade-off between efficiency, storage and accuracy. Specifically, we adopt a fixed-capacity memory bank to normalize…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition