Sherlock: Towards Multi-scene Video Abnormal Event Extraction and   Localization via a Global-local Spatial-sensitive LLM

Junxiao Ma; Jingjing Wang; Jiamin Luo; Peiying Yu; Guodong Zhou

arXiv:2502.18863·cs.CV·February 27, 2025

Sherlock: Towards Multi-scene Video Abnormal Event Extraction and Localization via a Global-local Spatial-sensitive LLM

Junxiao Ma, Jingjing Wang, Jiamin Luo, Peiying Yu, Guodong Zhou

PDF

Open Access

TL;DR

Sherlock introduces a novel multi-scene video abnormal event extraction and localization framework using a global-local spatial-sensitive large language model, effectively capturing structured semantic information for improved anomaly detection.

Contribution

The paper proposes the M-VAE task and a new Sherlock model with specialized modules to address global-local spatial modeling and balancing challenges in video anomaly detection.

Findings

01

Sherlock outperforms several advanced Video-LLMs on the M-VAE dataset.

02

Global-local spatial information significantly improves anomaly extraction accuracy.

03

The proposed GSM and SIR modules effectively address spatial modeling challenges.

Abstract

Prior studies on Video Anomaly Detection (VAD) mainly focus on detecting whether each video frame is abnormal or not in the video, which largely ignore the structured video semantic information (i.e., what, when, and where does the abnormal event happen). With this in mind, we propose a new chat-paradigm \textbf{M}ulti-scene Video Abnormal Event Extraction and Localization (M-VAE) task, aiming to extract the abnormal event quadruples (i.e., subject, event type, object, scene) and localize such event. Further, this paper believes that this new task faces two key challenges, i.e., global-local spatial modeling and global-local spatial balancing. To this end, this paper proposes a Global-local Spatial-sensitive Large Language Model (LLM) named Sherlock, i.e., acting like Sherlock Holmes to track down the criminal events, for this M-VAE task. Specifically, this model designs a Global-local…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Multimodal Machine Learning Applications · Video Analysis and Summarization