Can VLMs Unlock Semantic Anomaly Detection? A Framework for Structured Reasoning
Roberto Brusnicki, David Pop, Yuan Gao, Mattia Piccinini, Johannes Betz

TL;DR
This paper introduces SAVANT, a model-agnostic framework that enhances VLMs for semantic anomaly detection in autonomous driving by structured reasoning, significantly improving detection recall and enabling scalable data annotation.
Contribution
The paper presents SAVANT, a novel layered reasoning framework that improves VLM-based anomaly detection and facilitates large-scale data annotation for autonomous driving scenarios.
Findings
SAVANT improves VLM recall by approximately 18.5% over prompting baselines.
The framework enables automatic labeling of around 10,000 images with high confidence.
Fine-tuned models achieve 90.8% recall and 93.8% accuracy in anomaly detection.
Abstract
Autonomous driving systems remain critically vulnerable to the long-tail of rare, out-of-distribution semantic anomalies. While VLMs have emerged as promising tools for perception, their application in anomaly detection remains largely restricted to prompting proprietary models - limiting reliability, reproducibility, and deployment feasibility. To address this gap, we introduce SAVANT (Semantic Anomaly Verification/Analysis Toolkit), a novel model-agnostic reasoning framework that reformulates anomaly detection as a layered semantic consistency verification. By applying SAVANT's two-phase pipeline - structured scene description extraction and multi-modal evaluation - existing VLMs improve their scores in detecting anomalous driving scenarios from input images. Our approach replaces ad hoc prompting with semantic-aware reasoning, transforming VLM-based detection into a principled…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
