Explainability Via Causal Self-Talk
Nicholas A. Roy, Junkyung Kim, Neil Rabinowitz

TL;DR
This paper introduces Causal Self-Talk, a method where AI systems learn to explain their behavior by building causal models through self-communication, improving interpretability and control in deep reinforcement learning agents.
Contribution
The paper presents a novel approach for explainability by training AI to develop causal models via self-talk, applicable to deep RL agents in complex environments.
Findings
Agents can generate faithful, semantically meaningful explanations.
Learned models enable new semantic control interfaces.
Method improves interpretability without significant additional costs.
Abstract
Explaining the behavior of AI systems is an important problem that, in practice, is generally avoided. While the XAI community has been developing an abundance of techniques, most incur a set of costs that the wider deep learning community has been unwilling to pay in most situations. We take a pragmatic view of the issue, and define a set of desiderata that capture both the ambitions of XAI and the practical constraints of deep learning. We describe an effective way to satisfy all the desiderata: train the AI system to build a causal model of itself. We develop an instance of this solution for Deep RL agents: Causal Self-Talk. CST operates by training the agent to communicate with itself across time. We implement this method in a simulated 3D environment, and show how it enables agents to generate faithful and semantically-meaningful explanations of their own behavior. Beyond…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Machine Learning in Healthcare
