Explain Your Move: Understanding Agent Actions Using Specific and   Relevant Feature Attribution

Nikaash Puri; Sukriti Verma; Piyush Gupta; Dhruv Kayastha; Shripad; Deshmukh; Balaji Krishnamurthy; Sameer Singh

arXiv:1912.12191·cs.CV·April 7, 2020·32 cites

Explain Your Move: Understanding Agent Actions Using Specific and Relevant Feature Attribution

Nikaash Puri, Sukriti Verma, Piyush Gupta, Dhruv Kayastha, Shripad, Deshmukh, Balaji Krishnamurthy, Sameer Singh

PDF

Open Access 2 Repos

TL;DR

This paper introduces SARFA, a novel saliency map method for deep reinforcement learning agents that improves interpretability by focusing on relevant features affecting specific actions, validated across various games.

Contribution

SARFA is a new approach that balances specificity and relevance to produce more interpretable saliency maps for RL agents, outperforming existing methods.

Findings

01

SARFA generates more focused and interpretable saliency maps.

02

Human studies confirm improved interpretability of SARFA maps.

03

Automated evaluations show SARFA's effectiveness across multiple game environments.

Abstract

As deep reinforcement learning (RL) is applied to more tasks, there is a need to visualize and understand the behavior of learned agents. Saliency maps explain agent behavior by highlighting the features of the input state that are most relevant for the agent in taking an action. Existing perturbation-based approaches to compute saliency often highlight regions of the input that are not relevant to the action taken by the agent. Our proposed approach, SARFA (Specific and Relevant Feature Attribution), generates more focused saliency maps by balancing two aspects (specificity and relevance) that capture different desiderata of saliency. The first captures the impact of perturbation on the relative expected reward of the action to be explained. The second downweighs irrelevant features that alter the relative expected rewards of actions other than the action to be explained. We compare…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Reinforcement Learning in Robotics · Adversarial Robustness in Machine Learning