Vision Foundation Model Embedding-Based Semantic Anomaly Detection

Max Peter Ronecker; Matthew Foutter; Amine Elhafsi; Daniele Gammelli; Ihor Barakaiev; Marco Pavone; Daniel Watzenig

arXiv:2505.07998·cs.CV·May 14, 2025

Vision Foundation Model Embedding-Based Semantic Anomaly Detection

Max Peter Ronecker, Matthew Foutter, Amine Elhafsi, Daniele Gammelli, Ihor Barakaiev, Marco Pavone, Daniel Watzenig

PDF

TL;DR

This paper presents a novel approach for detecting semantic anomalies in autonomous systems by leveraging vision foundation model embeddings, comparing runtime images to nominal scenarios, and using filtering to reduce false positives.

Contribution

It introduces a framework utilizing vision embeddings for semantic anomaly detection, including raw and object-centric variants, with a filtering mechanism for improved robustness.

Findings

01

Instance-based method with filtering matches GPT-4o performance

02

Framework provides precise anomaly localization

03

Effective in simulated autonomous driving scenarios

Abstract

Semantic anomalies are contextually invalid or unusual combinations of familiar visual elements that can cause undefined behavior and failures in system-level reasoning for autonomous systems. This work explores semantic anomaly detection by leveraging the semantic priors of state-of-the-art vision foundation models, operating directly on the image. We propose a framework that compares local vision embeddings from runtime images to a database of nominal scenarios in which the autonomous system is deemed safe and performant. In this work, we consider two variants of the proposed framework: one using raw grid-based embeddings, and another leveraging instance segmentation for object-centric representations. To further improve robustness, we introduce a simple filtering mechanism to suppress false positives. Our evaluations on CARLA-simulated anomalies show that the instance-based method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.