TL;DR
DRIV-EX introduces a gradient-based method for generating fluent, valid counterfactual explanations of LLM decisions in autonomous driving, revealing biases and enhancing interpretability.
Contribution
It presents a novel approach combining gradient optimization and controlled decoding to produce coherent counterfactual scene descriptions for driving LLMs.
Findings
DRIV-EX generates more reliable counterfactuals than existing methods.
It exposes latent biases in LLM-based driving models.
The approach improves interpretability and robustness of autonomous driving decisions.
Abstract
Large language models (LLMs) are increasingly used as reasoning engines in autonomous driving, yet their decision-making remains opaque. We propose to study their decision process through counterfactual explanations, which identify the minimal semantic changes to a scene description required to alter a driving plan. We introduce DRIV-EX, a method that leverages gradient-based optimization on continuous embeddings to identify the input shifts required to flip the model's decision. Crucially, to avoid the incoherent text typical of unconstrained continuous optimization, DRIV-EX uses these optimized embeddings solely as a semantic guide: they are used to bias a controlled decoding process that re-generates the original scene description. This approach effectively steers the generation toward the counterfactual target while guaranteeing the linguistic fluency, domain validity, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
