TL;DR
This paper introduces a novel generative deep learning approach to produce counterfactual state explanations for deep reinforcement learning agents in visual environments, improving interpretability for non-expert users.
Contribution
It proposes counterfactual state explanations for RL agents, demonstrating their effectiveness in helping non-experts identify flawed agents through user studies.
Findings
Counterfactual explanations enable non-experts to better identify flawed agents.
Generated counterfactual states are sufficiently realistic for human interpretation.
Counterfactual explanations outperform nearest neighbor baselines in user studies.
Abstract
Counterfactual explanations, which deal with "why not?" scenarios, can provide insightful explanations to an AI agent's behavior. In this work, we focus on generating counterfactual explanations for deep reinforcement learning (RL) agents which operate in visual input environments like Atari. We introduce counterfactual state explanations, a novel example-based approach to counterfactual explanations based on generative deep learning. Specifically, a counterfactual state illustrates what minimal change is needed to an Atari game image such that the agent chooses a different action. We also evaluate the effectiveness of counterfactual states on human participants who are not machine learning experts. Our first user study investigates if humans can discern if the counterfactual state explanations are produced by the actual game or produced by a generative deep learning approach. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
