Benchmarking Vision Foundation Models for Input Monitoring in Autonomous Driving

Mert Keser; Halil Ibrahim Orhan; Niki Amini-Naieni; Gesina Schwalbe; Alois Knoll; Matthias Rottmann

arXiv:2501.08083·cs.CV·May 13, 2026

Benchmarking Vision Foundation Models for Input Monitoring in Autonomous Driving

Mert Keser, Halil Ibrahim Orhan, Niki Amini-Naieni, Gesina Schwalbe, Alois Knoll, Matthias Rottmann

PDF

TL;DR

This paper evaluates the effectiveness of Vision Foundation Models combined with density estimation techniques for unsupervised out-of-distribution detection in autonomous driving, demonstrating superior performance over existing methods.

Contribution

It introduces a systematic benchmark of VFMs with density modeling for OOD detection in complex driving scenarios, highlighting their potential for safety monitoring.

Findings

01

VFM embeddings with density estimation outperform existing OOD detection methods.

02

The proposed framework effectively detects high-risk inputs that could cause errors.

03

Systematic evaluation across diverse conditions confirms the robustness of the approach.

Abstract

Deep neural networks (DNNs) remain challenged by distribution shifts in complex open-world domains like automated driving (AD): Robustness against yet unknown novel objects (semantic shift) or styles like lighting conditions (covariate shift) cannot be guaranteed. Hence, reliable operation-time monitors for identification of out-of-training-data-distribution (OOD) scenarios are imperative. Current approaches for OOD classification are untested for complex domains like AD, are limited in the kinds of shifts they detect, or even require supervision with OOD samples. To prepare for unanticipated shifts, we instead establish a framework around a principled, unsupervised and model-agnostic method that unifies detection of semantic and covariate shifts: Find a full model of the training data's feature distribution, to then use its density at new points as in-distribution (ID) score. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.