Hawk: Learning to Understand Open-World Video Anomalies

Jiaqi Tang; Hao Lu; Ruizheng Wu; Xiaogang Xu; Ke Ma; Cheng Fang; Bin; Guo; Jiangbo Lu; Qifeng Chen; Ying-Cong Chen

arXiv:2405.16886·cs.CV·May 28, 2024·1 cites

Hawk: Learning to Understand Open-World Video Anomalies

Jiaqi Tang, Hao Lu, Ruizheng Wu, Xiaogang Xu, Ke Ma, Cheng Fang, Bin, Guo, Jiangbo Lu, Qifeng Chen, Ying-Cong Chen

PDF

Open Access 1 Repo 1 Models 1 Datasets

TL;DR

Hawk is a novel framework that enhances open-world video anomaly detection by integrating motion-aware visual language models, auxiliary consistency loss, and extensive annotated datasets for improved interpretation and question-answering.

Contribution

Hawk introduces a new approach combining motion modality and large visual language models with annotated datasets for superior open-world video anomaly understanding.

Findings

01

Achieves state-of-the-art performance in video description generation.

02

Outperforms baselines in open-world question-answering.

03

Effectively integrates motion and language understanding.

Abstract

Video Anomaly Detection (VAD) systems can autonomously monitor and identify disturbances, reducing the need for manual labor and associated costs. However, current VAD systems are often limited by their superficial semantic understanding of scenes and minimal user interaction. Additionally, the prevalent data scarcity in existing datasets restricts their applicability in open-world scenarios. In this paper, we introduce Hawk, a novel framework that leverages interactive large Visual Language Models (VLM) to interpret video anomalies precisely. Recognizing the difference in motion information between abnormal and normal videos, Hawk explicitly integrates motion modality to enhance anomaly identification. To reinforce motion attention, we construct an auxiliary consistency loss within the motion and video space, guiding the video branch to focus on the motion modality. Moreover, to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jqtangust/hawk
pytorchOfficial

Models

🤗
Jiaqi-hkust/hawk
model· ♡ 3
♡ 3

Datasets

Jiaqi-hkust/hawk
dataset· 963 dl
963 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media Forensic Detection

MethodsFocus