Advancing Adaptive Multi-Stage Video Anomaly Reasoning: A Benchmark Dataset and Method

Chao Huang; Benfeng Wang; Wei Wang; Jie Wen; Li Shen; Wenqi Ren; Yong Xu; Xiaochun Cao

arXiv:2601.10165·cs.CV·January 16, 2026

Advancing Adaptive Multi-Stage Video Anomaly Reasoning: A Benchmark Dataset and Method

Chao Huang, Benfeng Wang, Wei Wang, Jie Wen, Li Shen, Wenqi Ren, Yong Xu, Xiaochun Cao

PDF

Open Access

TL;DR

This paper introduces a new benchmark dataset and method for Video Anomaly Reasoning (VAR), enabling multi-stage, structured reasoning in video anomaly detection and understanding, with improved performance over existing approaches.

Contribution

It defines the novel task of VAR, creates a large annotated dataset with structured reasoning chains, and proposes an MLLM-based model supporting adaptive hierarchical reasoning and decision making.

Findings

01

The dataset contains over 8,600 videos with 50,000+ annotated reasoning samples.

02

The proposed Vad-R1-Plus model outperforms existing baselines on VAR tasks.

03

Systematic evaluation shows improved reasoning and decision-making capabilities.

Abstract

Recent progress in reasoning capabilities of Multimodal Large Language Models(MLLMs) has highlighted their potential for performing complex video understanding tasks. However, in the domain of Video Anomaly Detection and Understanding (VAD&U), existing MLLM-based methods are largely limited to anomaly localization or post-hoc description, lacking explicit reasoning processes, risk awareness, and decision-oriented interpretation. To address this gap, we define a new task termed Video Anomaly Reasoning (VAR), which elevates video anomaly analysis from descriptive understanding to structured, multi-stage reasoning. VAR explicitly requires models to perform progressive reasoning over anomalous events before answering anomaly-related questions, encompassing visual perception, causal interpretation, and risk-aware decision making. To support this task, we present a new dataset with 8,641…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Multimodal Machine Learning Applications · Human Pose and Action Recognition