GridVAD: Open-Set Video Anomaly Detection via Spatial Reasoning over Stratified Frame Grids

Mohamed Eltahir; Ahmed O. Ibrahim; Obada Siralkhatim; Tabarak Abdallah; Sondos Mohamed

arXiv:2603.25467·cs.CV·April 3, 2026

GridVAD: Open-Set Video Anomaly Detection via Spatial Reasoning over Stratified Frame Grids

Mohamed Eltahir, Ahmed O. Ibrahim, Obada Siralkhatim, Tabarak Abdallah, Sondos Mohamed

PDF

1 Repo

TL;DR

GridVAD introduces a training-free, open-set video anomaly detection method leveraging vision-language models with stratified spatial reasoning, achieving state-of-the-art pixel-level accuracy and efficiency.

Contribution

It proposes a novel propose-ground-propagate framework that uses VLMs for anomaly proposals, grounded by purpose-built modules, without domain-specific training.

Findings

01

Achieves highest Pixel-AUROC (77.59) on UCSD Ped2 among zero-shot methods.

02

Outperforms fine-tuned methods like TAO in pixel-level anomaly detection.

03

Is 2.7x more call-efficient than uniform per-frame VLM querying.

Abstract

Vision-Language Models (VLMs) are powerful open-set reasoners, yet their direct use as anomaly detectors in video surveillance is fragile: without calibrated anomaly priors, they alternate between missed detections and hallucinated false alarms. We argue the problem is not the VLM itself but how it is used. VLMs should function as anomaly proposers, generating open-set candidate descriptions that are then grounded and tracked by purpose-built spatial and temporal modules. We instantiate this propose-ground-propagate principle in GridVAD, a training-free pipeline that produces pixel-level anomaly masks without any domain-specific training. A VLM reasons over stratified grid representations of video clips to generate natural-language anomaly proposals. Self-Consistency Consolidation (SCC) filters hallucinations by retaining only proposals that recur across multiple independent samplings.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://gridvad.github.io
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.