Semantic Anomaly Detection with Large Language Models
Amine Elhafsi, Rohan Sinha, Christopher Agia, Edward Schmerling, Issa, Nesnas, Marco Pavone

TL;DR
This paper explores using large language models to detect semantic anomalies in autonomous systems, aiming to improve system safety by recognizing edge cases that challenge current reasoning capabilities.
Contribution
It introduces a novel framework leveraging LLMs for semantic anomaly detection in vision-based policies, demonstrating effectiveness in autonomous driving and object manipulation tasks.
Findings
LLM-based monitors effectively identify semantic anomalies.
The approach aligns well with human reasoning.
Discussion highlights strengths and limitations of using foundation models.
Abstract
As robots acquire increasingly sophisticated skills and see increasingly complex and varied environments, the threat of an edge case or anomalous failure is ever present. For example, Tesla cars have seen interesting failure modes ranging from autopilot disengagements due to inactive traffic lights carried by trucks to phantom braking caused by images of stop signs on roadside billboards. These system-level failures are not due to failures of any individual component of the autonomy stack but rather system-level deficiencies in semantic reasoning. Such edge cases, which we call semantic anomalies, are simple for a human to disentangle yet require insightful reasoning. To this end, we study the application of large language models (LLMs), endowed with broad contextual understanding and reasoning capabilities, to recognize such edge cases and introduce a monitoring framework for semantic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Software System Performance and Reliability · Network Security and Intrusion Detection
