CueBench: Advancing Unified Understanding of Context-Aware Video Anomalies in Real-World

Yating Yu; Congqi Cao; Zhaoying Wang; Weihua Meng; Jie Li; Yuxin Li; Zihao Wei; Zhongpei Shen; Jiajun Zhang

arXiv:2511.00613·cs.CV·November 4, 2025

CueBench: Advancing Unified Understanding of Context-Aware Video Anomalies in Real-World

Yating Yu, Congqi Cao, Zhaoying Wang, Weihua Meng, Jie Li, Yuxin Li, Zihao Wei, Zhongpei Shen, Jiajun Zhang

PDF

Open Access 1 Datasets

TL;DR

CueBench is a comprehensive benchmark for evaluating context-aware video anomaly understanding in real-world scenarios, highlighting current models' limitations and proposing a new fine-tuning approach that significantly improves performance.

Contribution

The paper introduces CueBench, a novel unified benchmark with a hierarchical taxonomy for context-aware video anomalies, and proposes Cue-R1, a reinforcement fine-tuning method that outperforms existing models.

Findings

01

Existing vision-language models perform poorly on real-world anomaly understanding.

02

Cue-R1 surpasses state-of-the-art methods by over 24% on CueBench.

03

CueBench provides a rigorous evaluation framework for diverse anomaly tasks.

Abstract

How far are deep models from real-world video anomaly understanding (VAU)? Current works typically emphasize on detecting unexpected occurrences deviated from normal patterns or comprehending anomalous events with interpretable descriptions. However, they exhibit only a superficial comprehension of real-world anomalies, with limited breadth in complex principles and subtle context that distinguish the anomalies from normalities, e.g., climbing cliffs with safety gear vs. without it. To this end, we introduce CueBench, the first of its kind Benchmark, devoted to Context-aware video anomalies within a Unified Evaluation framework. We comprehensively establish an event-centric hierarchical taxonomy that anchors two core event types: 14 conditional and 18 absolute anomaly events, defined by their refined semantics from diverse contexts across 174 scenes and 198 attributes. Based on this, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

CueBench/CueBench
dataset· 28 dl
28 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Human Pose and Action Recognition · Video Analysis and Summarization