Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via   Multi-modal LLM

Huaxin Zhang; Xiaohao Xu; Xiang Wang; Jialong Zuo; Chuchu Han; Xiaonan; Huang; Changxin Gao; Yuehuan Wang; Nong Sang

arXiv:2406.12235·cs.CV·July 2, 2024·3 cites

Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM

Huaxin Zhang, Xiaohao Xu, Xiang Wang, Jialong Zuo, Chuchu Han, Xiaonan, Huang, Changxin Gao, Yuehuan Wang, Nong Sang

PDF

Open Access 1 Repo

TL;DR

Holmes-VAD introduces a multimodal, explainable video anomaly detection framework that leverages large-scale instruction tuning and rich annotations to improve bias mitigation and interpretability in detecting challenging or unseen events.

Contribution

The paper presents the first large-scale multimodal VAD instruction-tuning benchmark and a novel framework that enhances unbiasedness and interpretability in video anomaly detection.

Findings

01

Holmes-VAD achieves improved anomaly localization accuracy.

02

The framework provides comprehensive explanations for detections.

03

Benchmark and model are publicly available for community use.

Abstract

Towards open-ended Video Anomaly Detection (VAD), existing methods often exhibit biased detection when faced with challenging or unseen events and lack interpretability. To address these drawbacks, we propose Holmes-VAD, a novel framework that leverages precise temporal supervision and rich multimodal instructions to enable accurate anomaly localization and comprehensive explanations. Firstly, towards unbiased and explainable VAD system, we construct the first large-scale multimodal VAD instruction-tuning benchmark, i.e., VAD-Instruct50k. This dataset is created using a carefully designed semi-automatic labeling paradigm. Efficient single-frame annotations are applied to the collected untrimmed videos, which are then synthesized into high-quality analyses of both abnormal and normal video clips using a robust off-the-shelf video captioner and a large language model (LLM). Building upon…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pipixin321/holmesvad
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Network Security and Intrusion Detection · Digital Media Forensic Detection