Can VLMs Unlock Semantic Anomaly Detection? A Framework for Structured Reasoning

Roberto Brusnicki; David Pop; Yuan Gao; Mattia Piccinini; Johannes Betz

arXiv:2510.18034·cs.CV·May 21, 2026

Can VLMs Unlock Semantic Anomaly Detection? A Framework for Structured Reasoning

Roberto Brusnicki, David Pop, Yuan Gao, Mattia Piccinini, Johannes Betz

PDF

1 Repo 3 Models 3 Datasets

TL;DR

This paper introduces SAVANT, a model-agnostic framework that enhances VLMs for semantic anomaly detection in autonomous driving by structured reasoning, significantly improving detection recall and enabling scalable data annotation.

Contribution

The paper presents SAVANT, a novel layered reasoning framework that improves VLM-based anomaly detection and facilitates large-scale data annotation for autonomous driving scenarios.

Findings

01

SAVANT improves VLM recall by approximately 18.5% over prompting baselines.

02

The framework enables automatic labeling of around 10,000 images with high confidence.

03

Fine-tuned models achieve 90.8% recall and 93.8% accuracy in anomaly detection.

Abstract

Autonomous driving systems remain critically vulnerable to the long-tail of rare, out-of-distribution semantic anomalies. While VLMs have emerged as promising tools for perception, their application in anomaly detection remains largely restricted to prompting proprietary models - limiting reliability, reproducibility, and deployment feasibility. To address this gap, we introduce SAVANT (Semantic Anomaly Verification/Analysis Toolkit), a novel model-agnostic reasoning framework that reformulates anomaly detection as a layered semantic consistency verification. By applying SAVANT's two-phase pipeline - structured scene description extraction and multi-modal evaluation - existing VLMs improve their scores in detecting anomalous driving scenarios from input images. Our approach replaces ad hoc prompting with semantic-aware reasoning, transforming VLM-based detection into a principled…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://TUM-AVS.github.io/SAVANT
github

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.