The Evolution of Video Anomaly Detection: A Unified Framework from DNN to MLLM

Shibo Gao; Peipei Yang; Haiyang Guo; Yangyang Liu; Yi Chen; Shuai Li; Han Zhu; Jian Xu; Xu-Yao Zhang; Linlin Huang

arXiv:2507.21649·cs.CV·July 30, 2025

The Evolution of Video Anomaly Detection: A Unified Framework from DNN to MLLM

Shibo Gao, Peipei Yang, Haiyang Guo, Yangyang Liu, Yi Chen, Shuai Li, Han Zhu, Jian Xu, Xu-Yao Zhang, Linlin Huang

PDF

TL;DR

This paper provides a comprehensive survey and a unified framework for video anomaly detection, highlighting the impact of large language models and multi-modal large language models on the evolution of the field.

Contribution

It introduces a unified framework for DNN and LLM-based VAD methods and analyzes recent advancements and challenges in the era of large models.

Findings

01

MLLMs and LLMs significantly enhance VAD capabilities.

02

A classification system for VAD methods is proposed.

03

Identifies key challenges and future directions in VAD.

Abstract

Video anomaly detection (VAD) aims to identify and ground anomalous behaviors or events in videos, serving as a core technology in the fields of intelligent surveillance and public safety. With the advancement of deep learning, the continuous evolution of deep model architectures has driven innovation in VAD methodologies, significantly enhancing feature representation and scene adaptability, thereby improving algorithm generalization and expanding application boundaries. More importantly, the rapid development of multi-modal large language (MLLMs) and large language models (LLMs) has introduced new opportunities and challenges to the VAD field. Under the support of MLLMs and LLMs, VAD has undergone significant transformations in terms of data annotation, input modalities, model architectures, and task objectives. The surge in publications and the evolution of tasks have created an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.