Explaining Agent Behavior with Large Language Models
Xijia Zhang, Yue Guo, Simon Stepputtis, Katia Sycara, and Joseph, Campbell

TL;DR
This paper introduces a method for generating natural language explanations for agent behavior using large language models, enabling interpretability and user interaction without revealing underlying model details.
Contribution
The approach learns a compact behavior representation to produce plausible explanations from observations, facilitating interpretability of complex agents.
Findings
Generated explanations are as helpful as human experts'
Enables user interactions like clarification and counterfactual queries
Produces minimal hallucination in explanations
Abstract
Intelligent agents such as robots are increasingly deployed in real-world, safety-critical settings. It is vital that these agents are able to explain the reasoning behind their decisions to human counterparts, however, their behavior is often produced by uninterpretable models such as deep neural networks. We propose an approach to generate natural language explanations for an agent's behavior based only on observations of states and actions, agnostic to the underlying model representation. We show how a compact representation of the agent's behavior can be learned and used to produce plausible explanations with minimal hallucination while affording user interaction with a pre-trained large language model. Through user studies and empirical experiments, we show that our approach generates explanations as helpful as those generated by a human domain expert while enabling beneficial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Natural Language Processing Techniques
