VAGU & GtS: LLM-Based Benchmark and Framework for Joint Video Anomaly Grounding and Understanding

Shibo Gao; Peipei Yang; Yangyang Liu; Yi Chen; Han Zhu; Xuyao Zhang; Linlin Huang

arXiv:2507.21507·cs.CV·July 30, 2025

VAGU & GtS: LLM-Based Benchmark and Framework for Joint Video Anomaly Grounding and Understanding

Shibo Gao, Peipei Yang, Yangyang Liu, Yi Chen, Han Zhu, Xuyao Zhang, Linlin Huang

PDF

TL;DR

This paper introduces VAGU, a comprehensive benchmark dataset for joint video anomaly detection and grounding, and proposes GtS, a prompt-guided framework for detailed anomaly understanding, along with a new evaluation metric.

Contribution

The paper presents the first dataset supporting both anomaly grounding and understanding, and introduces a training-free, prompt-guided framework for detailed anomaly analysis.

Findings

01

VAGU dataset effectively supports joint anomaly detection and grounding tasks.

02

GtS framework achieves coarse localization and detailed interpretation without training.

03

JeAUG metric provides a balanced evaluation of semantic interpretability and temporal accuracy.

Abstract

Video Anomaly Detection (VAD) aims to identify anomalous events in videos and accurately determine their time intervals. Current VAD methods mainly fall into two categories: traditional DNN-based approaches that focus on temporal localization, and LLM-based approaches that emphasize semantic understanding. Both anomaly understanding and grounding are essential for comprehensive video anomaly detection and can complement each other. However, no existing model or dataset supports both tasks simultaneously. To address this, we introduce VAGU (Video Anomaly Grounding and Understanding), the first benchmark to integrate both tasks. Each VAGU instance includes annotations for anomaly category, semantic explanation, precise temporal grounding and Video QA. We also provide multiple-choice Video QA for objective evaluation. Based on this dataset, we propose Glance then Scrutinize (GtS), a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.