TL;DR
This paper investigates the robustness of black-box LLM planners under sensor noise and prompt perturbations, proposing an adaptive stress testing method to identify failure scenarios in safety-critical environments.
Contribution
It introduces a novel adaptive stress testing approach using Monte-Carlo tree search to efficiently explore perturbation spaces affecting LLM decision-making.
Findings
LLM planners hallucinate under various perturbations in driving scenarios.
The proposed method identifies high-uncertainty and failure scenarios proactively.
Offline analysis with MCTS prompt trees reveals potential runtime failures.
Abstract
Large language models (LLMs) have recently demonstrated success in decision-making tasks including planning, control, and prediction, but their tendency to hallucinate unsafe and undesired outputs poses risks. This unwanted behavior is further exacerbated in environments where sensors are noisy or unreliable. Characterizing the behavior of LLM planners to varied observations is necessary to proactively avoid failures in safety-critical scenarios. We specifically investigate the response of LLMs along two different perturbation dimensions. Like prior works, one dimension generates semantically similar prompts with varied phrasing by randomizing order of details, modifying access to few-shot examples, etc. Unique to our work, the second dimension simulates access to varied sensors and noise to mimic raw sensor or detection algorithm failures. An initial case study in which perturbations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
