Explaining Deep Reinforcement Learning Agents In The Atari Domain   through a Surrogate Model

Alexander Sieusahai; Matthew Guzdial

arXiv:2110.03184·cs.LG·October 8, 2021

Explaining Deep Reinforcement Learning Agents In The Atari Domain through a Surrogate Model

Alexander Sieusahai, Matthew Guzdial

PDF

Open Access

TL;DR

This paper introduces a method to explain deep reinforcement learning agents in Atari games by transforming inputs and training an interpretable surrogate model that accurately mimics the agent's decisions.

Contribution

The paper presents a lightweight approach combining input transformation and surrogate modeling to enhance explainability of deep RL agents in the Atari domain.

Findings

01

Surrogate models accurately replicate agent behavior.

02

Input transformation improves interpretability.

03

Method effective across multiple Atari games.

Abstract

One major barrier to applications of deep Reinforcement Learning (RL) both inside and outside of games is the lack of explainability. In this paper, we describe a lightweight and effective method to derive explanations for deep RL agents, which we evaluate in the Atari domain. Our method relies on a transformation of the pixel-based input of the RL agent to an interpretable, percept-like input representation. We then train a surrogate model, which is itself interpretable, to replicate the behavior of the target, deep RL agent. Our experiments demonstrate that we can learn an effective surrogate that accurately approximates the underlying decision making of a target agent on a suite of Atari games.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning