MoniTor: Exploiting Large Language Models with Instruction for Online Video Anomaly Detection

Shengtian Yang; Yue Feng; Yingshi Liu; Jingrou Zhang; Jie Qin

arXiv:2510.21449·cs.CV·October 27, 2025

MoniTor: Exploiting Large Language Models with Instruction for Online Video Anomaly Detection

Shengtian Yang, Yue Feng, Yingshi Liu, Jingrou Zhang, Jie Qin

PDF

Open Access

TL;DR

MoniTor introduces a real-time, training-free online video anomaly detection method leveraging large vision-language models, a novel memory-based scoring queue, and LSTM-inspired prediction to effectively identify anomalies in surveillance videos.

Contribution

The paper presents MoniTor, a novel online VAD framework that uses pre-trained models, a memory-based scoring queue, and LSTM-inspired prediction for real-time anomaly detection without training.

Findings

01

Outperforms state-of-the-art methods on UCF-Crime and XD-Violence datasets.

02

Achieves competitive results with weakly supervised methods without training.

03

Effectively models temporal dependencies for anomaly detection.

Abstract

Video Anomaly Detection (VAD) aims to locate unusual activities or behaviors within videos. Recently, offline VAD has garnered substantial research attention, which has been invigorated by the progress in large language models (LLMs) and vision-language models (VLMs), offering the potential for a more nuanced understanding of anomalies. However, online VAD has seldom received attention due to real-time constraints and computational intensity. In this paper, we introduce a novel Memory-based online scoring queue scheme for Training-free VAD (MoniTor), to address the inherent complexities in online VAD. Specifically, MoniTor applies a streaming input to VLMs, leveraging the capabilities of pre-trained large-scale models. To capture temporal dependencies more effectively, we incorporate a novel prediction mechanism inspired by Long Short-Term Memory (LSTM) networks. This ensures the model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition