No Need For Real Anomaly: MLLM Empowered Zero-Shot Video Anomaly Detection

Zunkai Dai; Ke Li; Jiajia Liu; Jie Yang; Yuanyuan Qiao

arXiv:2602.19248·cs.CV·March 24, 2026

No Need For Real Anomaly: MLLM Empowered Zero-Shot Video Anomaly Detection

Zunkai Dai, Ke Li, Jiajia Liu, Jie Yang, Yuanyuan Qiao

PDF

Open Access

TL;DR

LAVIDA is a zero-shot video anomaly detection framework that uses pseudo-anomalies and multimodal language models to improve detection in open-world scenarios without requiring real anomaly data.

Contribution

The paper introduces LAVIDA, a novel zero-shot VAD method combining pseudo-anomaly generation and multimodal language models for better anomaly understanding.

Findings

01

Achieves state-of-the-art results on four benchmark datasets.

02

Effective in both frame-level and pixel-level anomaly detection.

03

Operates without any real anomaly training data.

Abstract

The collection and detection of video anomaly data has long been a challenging problem due to its rare occurrence and spatio-temporal scarcity. Existing video anomaly detection (VAD) methods under perform in open-world scenarios. Key contributing factors include limited dataset diversity, and inadequate understanding of context-dependent anomalous semantics. To address these issues, i) we propose LAVIDA, an end-to-end zero-shot video anomaly detection framework. ii) LAVIDA employs an Anomaly Exposure Sampler that transforms segmented objects into pseudo-anomalies to enhance model adaptability to unseen anomaly categories. It further integrates a Multimodal Large Language Model (MLLM) to bolster semantic comprehension capabilities. Additionally, iii) we design a token compression approach based on reverse attention to handle the spatio-temporal scarcity of anomalous patterns and decrease…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Human Pose and Action Recognition · Video Analysis and Summarization