Leveraging Reward Consistency for Interpretable Feature Discovery in   Reinforcement Learning

Qisen Yang; Huanqian Wang; Mukun Tong; Wenjie Shi; Gao Huang; Shiji; Song

arXiv:2309.01458·cs.LG·September 6, 2023

Leveraging Reward Consistency for Interpretable Feature Discovery in Reinforcement Learning

Qisen Yang, Huanqian Wang, Mukun Tong, Wenjie Shi, Gao Huang, Shiji, Song

PDF

Open Access

TL;DR

This paper introduces a novel reward-based interpretability framework for reinforcement learning that maintains reward consistency and improves feature attribution, addressing limitations of existing action-matching explanation methods.

Contribution

It proposes RL-in-RL, a new method that focuses on reward consistency for interpretable feature discovery in RL, overcoming the disconnection between actions and rewards.

Findings

01

Maintains reward consistency during feature attribution

02

Achieves high-quality, interpretable feature explanations

03

Outperforms existing methods in Atari and Duckietown environments

Abstract

The black-box nature of deep reinforcement learning (RL) hinders them from real-world applications. Therefore, interpreting and explaining RL agents have been active research topics in recent years. Existing methods for post-hoc explanations usually adopt the action matching principle to enable an easy understanding of vision-based RL agents. In this paper, it is argued that the commonly used action matching principle is more like an explanation of deep neural networks (DNNs) than the interpretation of RL agents. It may lead to irrelevant or misplaced feature attribution when different DNNs' outputs lead to the same rewards or different rewards result from the same outputs. Therefore, we propose to consider rewards, the essential objective of RL agents, as the essential objective of interpreting RL agents as well. To ensure reward consistency during interpretable feature discovery, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Reinforcement Learning in Robotics