Explaining Agent Behavior with Large Language Models

Xijia Zhang; Yue Guo; Simon Stepputtis; Katia Sycara; and Joseph; Campbell

arXiv:2309.10346·cs.LG·September 20, 2023·1 cites

Explaining Agent Behavior with Large Language Models

Xijia Zhang, Yue Guo, Simon Stepputtis, Katia Sycara, and Joseph, Campbell

PDF

Open Access

TL;DR

This paper introduces a method for generating natural language explanations for agent behavior using large language models, enabling interpretability and user interaction without revealing underlying model details.

Contribution

The approach learns a compact behavior representation to produce plausible explanations from observations, facilitating interpretability of complex agents.

Findings

01

Generated explanations are as helpful as human experts'

02

Enables user interactions like clarification and counterfactual queries

03

Produces minimal hallucination in explanations

Abstract

Intelligent agents such as robots are increasingly deployed in real-world, safety-critical settings. It is vital that these agents are able to explain the reasoning behind their decisions to human counterparts, however, their behavior is often produced by uninterpretable models such as deep neural networks. We propose an approach to generate natural language explanations for an agent's behavior based only on observations of states and actions, agnostic to the underlying model representation. We show how a compact representation of the agent's behavior can be learned and used to produce plausible explanations with minimal hallucination while affording user interaction with a pre-trained large language model. Through user studies and empirical experiments, we show that our approach generates explanations as helpful as those generated by a human domain expert while enabling beneficial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Natural Language Processing Techniques