Outcome-Guided Counterfactuals for Reinforcement Learning Agents from a Jointly Trained Generative Latent Space
Eric Yeh, Pedro Sequeira, Jesse Hostetler, Melinda Gervasio

TL;DR
This paper introduces a generative approach using a variational autoencoder to produce plausible counterfactuals for reinforcement learning agents by jointly encoding observations and outcomes, improving the interpretability and analysis of agent behavior.
Contribution
The method jointly trains a latent space for observations and outcomes, enabling more realistic counterfactual generation compared to previous outcome-only or case-based methods.
Findings
Counterfactuals are more plausible and proximal to queries.
Joint training yields higher-quality counterfactuals.
Method outperforms baseline approaches in three RL environments.
Abstract
We present a novel generative method for producing unseen and plausible counterfactual examples for reinforcement learning (RL) agents based upon outcome variables that characterize agent behavior. Our approach uses a variational autoencoder to train a latent space that jointly encodes information about the observations and outcome variables pertaining to an agent's behavior. Counterfactuals are generated using traversals in this latent space, via gradient-driven updates as well as latent interpolations against cases drawn from a pool of examples. These include updates to raise the likelihood of generated examples, which improves the plausibility of generated counterfactuals. From experiments in three RL environments, we show that these methods produce counterfactuals that are more plausible and proximal to their queries compared to purely outcome-driven or case-based baselines.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Topic Modeling · Explainable Artificial Intelligence (XAI)
MethodsCounterfactuals Explanations
