Reconstructing Actions To Explain Deep Reinforcement Learning
Xuan Chen, Zifan Wang, Yucai Fan, Bonan Jin, Piotr Mardziel, Carlee, Joe-Wong, Anupam Datta

TL;DR
This paper introduces a novel method for explaining deep reinforcement learning actions through action reconstruction functions, enabling more complex explanations and quantitative evaluation of explainability.
Contribution
It proposes action reconstruction functions for deep RL, introduces the 'agreement' metric, and demonstrates the effectiveness of perturbation-based attribution methods in explaining RL agents.
Findings
Perturbation-based attribution methods outperform alternatives in action reconstruction.
Action reconstruction provides insights into how agents learn in complex games.
The 'agreement' metric effectively evaluates explainability methods.
Abstract
Feature attribution has been a foundational building block for explaining the input feature importance in supervised learning with Deep Neural Network (DNNs), but face new challenges when applied to deep Reinforcement Learning (RL).We propose a new approach to explaining deep RL actions by defining a class of \emph{action reconstruction} functions that mimic the behavior of a network in deep RL. This approach allows us to answer more complex explainability questions than direct application of DNN attribution methods, which we adapt to \emph{behavior-level attributions} in building our action reconstructions. It also allows us to define \emph{agreement}, a metric for quantitatively evaluating the explainability of our methods. Our experiments on a variety of Atari games suggest that perturbation-based attribution methods are significantly more suitable in reconstructing actions to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Reinforcement Learning in Robotics · Anomaly Detection Techniques and Applications
