Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thought

Chao Huang; Benfeng Wang; Jie Wen; Chengliang Liu; Wei Wang; Li Shen; Xiaochun Cao

arXiv:2505.19877·cs.CV·May 27, 2025

Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thought

Chao Huang, Benfeng Wang, Jie Wen, Chengliang Liu, Wei Wang, Li Shen, Xiaochun Cao

PDF

Open Access 1 Repo 1 Models

TL;DR

Vad-R1 introduces a novel framework for video anomaly reasoning that guides multimodal models through explicit step-by-step analysis, significantly improving deep understanding of anomalies in videos.

Contribution

The paper presents Vad-R1, a new end-to-end MLLM-based framework with a perception-to-cognition reasoning chain and a dedicated dataset for deep video anomaly analysis.

Findings

01

Vad-R1 outperforms existing models on VAD and VAR tasks.

02

The P2C-CoT reasoning process enhances anomaly understanding.

03

The AVA-GRPO algorithm improves reasoning accuracy with limited annotations.

Abstract

Recent advancements in reasoning capability of Multimodal Large Language Models (MLLMs) demonstrate its effectiveness in tackling complex visual tasks. However, existing MLLM-based Video Anomaly Detection (VAD) methods remain limited to shallow anomaly descriptions without deep reasoning. In this paper, we propose a new task named Video Anomaly Reasoning (VAR), which aims to enable deep analysis and understanding of anomalies in the video by requiring MLLMs to think explicitly before answering. To this end, we propose Vad-R1, an end-to-end MLLM-based framework for VAR. Specifically, we design a Perception-to-Cognition Chain-of-Thought (P2C-CoT) that simulates the human process of recognizing anomalies, guiding the MLLM to reason anomaly step-by-step. Based on the structured P2C-CoT, we construct Vad-Reasoning, a dedicated dataset for VAR. Furthermore, we propose an improved…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wbfwonderful/vad-r1
noneOfficial

Models

🤗
nvidia/Cosmos-Embed1-448p-anomaly-detection
model· 380 dl· ♡ 5
380 dl♡ 5

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Digital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis