Uncovering What, Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly

Hang Du; Sicheng Zhang; Binzhu Xie; Guoshun Nan; Jiayang Zhang; Junrui Xu; Hangyu Liu; Sicong Leng; Jiangming Liu; Hehe Fan; Dajiu Huang; Jing Feng; Linli Chen; Can Zhang; Xuhuan Li; Hao Zhang; Jianhang Chen; Qimei Cui; Xiaofeng Tao

arXiv:2405.00181·cs.CV·March 30, 2026

Uncovering What, Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly

Hang Du, Sicheng Zhang, Binzhu Xie, Guoshun Nan, Jiayang Zhang, Junrui Xu, Hangyu Liu, Sicong Leng, Jiangming Liu, Hehe Fan, Dajiu Huang, Jing Feng, Linli Chen, Can Zhang, Xuhuan Li, Hao Zhang, Jianhang Chen, Qimei Cui, Xiaofeng Tao

PDF

1 Repo 1 Datasets 1 Video

TL;DR

This paper introduces a comprehensive benchmark and evaluation metric for understanding the causes, reasons, and severity of video anomalies, advancing beyond detection to explainability.

Contribution

It presents the CUVA benchmark with detailed annotations for anomaly type, cause, and effect, along with MMEval, a new metric for assessing causation understanding in videos.

Findings

01

The MMEval metric better aligns with human judgment.

02

The prompt-based approach outperforms existing methods.

03

The dataset enables detailed causation analysis of video anomalies.

Abstract

Video anomaly understanding (VAU) aims to automatically comprehend unusual occurrences in videos, thereby enabling various applications such as traffic surveillance and industrial manufacturing. While existing VAU benchmarks primarily concentrate on anomaly detection and localization, our focus is on more practicality, prompting us to raise the following crucial questions: "what anomaly occurred?", "why did it happen?", and "how severe is this abnormal event?". In pursuit of these answers, we present a comprehensive benchmark for Causation Understanding of Video Anomaly (CUVA). Specifically, each instance of the proposed benchmark involves three sets of human annotations to indicate the "what", "why" and "how" of an anomaly, including 1) anomaly type, start and end times, and event descriptions, 2) natural language explanations for the cause of an anomaly, and 3) free text reflecting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fesvhtr/CUVA
github

Datasets

fesvhtr/CUVA
dataset· 445 dl
445 dl

Videos

Uncovering What Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly· slideslive