VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs
Rohit Bharadwaj, Hanan Gani, Muzammal Naseer, Fahad Shahbaz Khan,, Salman Khan

TL;DR
VANE-Bench introduces a comprehensive benchmark dataset and evaluation framework to assess the ability of Video-LMMs in detecting and localizing anomalies in videos, highlighting current limitations and guiding future improvements.
Contribution
This paper presents VANE-Bench, the first benchmark for evaluating Video-LMMs on anomaly detection and localization tasks in videos, including synthetic and real-world data.
Findings
Most Video-LMMs struggle with subtle anomalies
Benchmark reveals significant room for improvement in anomaly detection
Synthetic and real-world datasets provide diverse evaluation scenarios
Abstract
The recent developments in Large Multi-modal Video Models (Video-LMMs) have significantly enhanced our ability to interpret and analyze video data. Despite their impressive capabilities, current Video-LMMs have not been evaluated for anomaly detection tasks, which is critical to their deployment in practical scenarios e.g., towards identifying deepfakes, manipulated video content, traffic accidents and crimes. In this paper, we introduce VANE-Bench, a benchmark designed to assess the proficiency of Video-LMMs in detecting and localizing anomalies and inconsistencies in videos. Our dataset comprises an array of videos synthetically generated using existing state-of-the-art text-to-video generation models, encompassing a variety of subtle anomalies and inconsistencies grouped into five categories: unnatural transformations, unnatural appearance, pass-through, disappearance and sudden…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAnomaly Detection Techniques and Applications
