OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
Yifei Li, Junbo Niu, Ziyang Miao, Chunjiang Ge, Yuanhang Zhou, Qihao, He, Xiaoyi Dong, Haodong Duan, Shuangrui Ding, Rui Qian, Pan Zhang, Yuhang, Zang, Yuhang Cao, Conghui He, Jiaqi Wang

TL;DR
OVO-Bench is a new benchmark designed to evaluate online video language models' ability to understand and reason about videos in real-time, focusing on temporal awareness and dynamic response scenarios.
Contribution
The paper introduces OVO-Bench, a comprehensive benchmark with new tasks and annotations specifically for assessing online video understanding in LLMs, addressing a gap in existing evaluations.
Findings
Current models underperform on online video understanding tasks.
Significant gap between model performance and human-level reasoning.
Benchmark reveals models' struggles with temporal reasoning and real-time comprehension.
Abstract
Temporal Awareness, the ability to reason dynamically based on the timestamp when a question is raised, is the key distinction between offline and online video LLMs. Unlike offline models, which rely on complete videos for static, post hoc analysis, online models process video streams incrementally and dynamically adapt their responses based on the timestamp at which the question is posed. Despite its significance, temporal awareness has not been adequately evaluated in existing benchmarks. To fill this gap, we present OVO-Bench (Online-VideO-Benchmark), a novel video benchmark that emphasizes the importance of timestamps for advanced online video understanding capability benchmarking. OVO-Bench evaluates the ability of video LLMs to reason and respond to events occurring at specific timestamps under three distinct scenarios: (1) Backward tracing: trace back to past events to answer the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization
MethodsHigh-Order Consensuses
