Counterfactual State Explanations for Reinforcement Learning Agents via   Generative Deep Learning

Matthew L. Olson; Roli Khanna; Lawrence Neal; Fuxin Li; Weng-Keen Wong

arXiv:2101.12446·cs.AI·February 1, 2021

Counterfactual State Explanations for Reinforcement Learning Agents via Generative Deep Learning

Matthew L. Olson, Roli Khanna, Lawrence Neal, Fuxin Li, Weng-Keen Wong

PDF

2 Repos

TL;DR

This paper introduces a novel generative deep learning approach to produce counterfactual state explanations for deep reinforcement learning agents in visual environments, improving interpretability for non-expert users.

Contribution

It proposes counterfactual state explanations for RL agents, demonstrating their effectiveness in helping non-experts identify flawed agents through user studies.

Findings

01

Counterfactual explanations enable non-experts to better identify flawed agents.

02

Generated counterfactual states are sufficiently realistic for human interpretation.

03

Counterfactual explanations outperform nearest neighbor baselines in user studies.

Abstract

Counterfactual explanations, which deal with "why not?" scenarios, can provide insightful explanations to an AI agent's behavior. In this work, we focus on generating counterfactual explanations for deep reinforcement learning (RL) agents which operate in visual input environments like Atari. We introduce counterfactual state explanations, a novel example-based approach to counterfactual explanations based on generative deep learning. Specifically, a counterfactual state illustrates what minimal change is needed to an Atari game image such that the agent chooses a different action. We also evaluate the effectiveness of counterfactual states on human participants who are not machine learning experts. Our first user study investigates if humans can discern if the counterfactual state explanations are produced by the actual game or produced by a generative deep learning approach. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.