A Semantic Observer Layer for Autonomous Vehicles: Pre-Deployment Feasibility Study of VLMs for Low-Latency Anomaly Detection

Kunal Runwal; Swaraj Gajare; Daniel Adejumo; Omkar Ankalkope; Siddhant Baroth; and Aliasghar Arab

arXiv:2603.28888·cs.RO·April 1, 2026

A Semantic Observer Layer for Autonomous Vehicles: Pre-Deployment Feasibility Study of VLMs for Low-Latency Anomaly Detection

Kunal Runwal, Swaraj Gajare, Daniel Adejumo, Omkar Ankalkope, Siddhant Baroth, and Aliasghar Arab

PDF

TL;DR

This paper introduces a semantic observer layer using a quantized vision-language model to detect anomalies in autonomous vehicles at low latency, ensuring safety during deployment.

Contribution

It demonstrates the feasibility of deploying a low-latency, quantized VLM-based semantic observer for anomaly detection in autonomous vehicles.

Findings

01

Achieved ~500 ms inference time with quantization and FlashAttention2.

02

Identified NF4 recall collapse as a deployment constraint.

03

Mapped performance metrics to safety goals through hazard analysis.

Abstract

Semantic anomalies-context-dependent hazards that pixel-level detectors cannot reason about-pose a critical safety risk in autonomous driving. We propose a \emph{semantic observer layer}: a quantized vision-language model (VLM) running at 1--2\,Hz alongside the primary AV control loop, monitoring for semantic edge cases, and triggering fail-safe handoffs when detected. Using Nvidia Cosmos-Reason1-7B with NVFP4 quantization and FlashAttention2, we achieve ~500 ms inference a ~50x speedup over the unoptimized FP16 baseline (no quantization, standard PyTorch attention) on the same hardware--satisfying the observer timing budget. We benchmark accuracy, latency, and quantization behavior in static and video conditions, identify NF4 recall collapse (10.6%) as a hard deployment constraint, and a hazard analysis mapping performance metrics to safety goals. The results establish a pre-deployment…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.