Advancing Adaptive Multi-Stage Video Anomaly Reasoning: A Benchmark Dataset and Method
Chao Huang, Benfeng Wang, Wei Wang, Jie Wen, Li Shen, Wenqi Ren, Yong Xu, Xiaochun Cao

TL;DR
This paper introduces a new benchmark dataset and method for Video Anomaly Reasoning (VAR), enabling multi-stage, structured reasoning in video anomaly detection and understanding, with improved performance over existing approaches.
Contribution
It defines the novel task of VAR, creates a large annotated dataset with structured reasoning chains, and proposes an MLLM-based model supporting adaptive hierarchical reasoning and decision making.
Findings
The dataset contains over 8,600 videos with 50,000+ annotated reasoning samples.
The proposed Vad-R1-Plus model outperforms existing baselines on VAR tasks.
Systematic evaluation shows improved reasoning and decision-making capabilities.
Abstract
Recent progress in reasoning capabilities of Multimodal Large Language Models(MLLMs) has highlighted their potential for performing complex video understanding tasks. However, in the domain of Video Anomaly Detection and Understanding (VAD&U), existing MLLM-based methods are largely limited to anomaly localization or post-hoc description, lacking explicit reasoning processes, risk awareness, and decision-oriented interpretation. To address this gap, we define a new task termed Video Anomaly Reasoning (VAR), which elevates video anomaly analysis from descriptive understanding to structured, multi-stage reasoning. VAR explicitly requires models to perform progressive reasoning over anomalous events before answering anomaly-related questions, encompassing visual perception, causal interpretation, and risk-aware decision making. To support this task, we present a new dataset with 8,641…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Multimodal Machine Learning Applications · Human Pose and Action Recognition
