Explain Your Move: Understanding Agent Actions Using Specific and Relevant Feature Attribution
Nikaash Puri, Sukriti Verma, Piyush Gupta, Dhruv Kayastha, Shripad, Deshmukh, Balaji Krishnamurthy, Sameer Singh

TL;DR
This paper introduces SARFA, a novel saliency map method for deep reinforcement learning agents that improves interpretability by focusing on relevant features affecting specific actions, validated across various games.
Contribution
SARFA is a new approach that balances specificity and relevance to produce more interpretable saliency maps for RL agents, outperforming existing methods.
Findings
SARFA generates more focused and interpretable saliency maps.
Human studies confirm improved interpretability of SARFA maps.
Automated evaluations show SARFA's effectiveness across multiple game environments.
Abstract
As deep reinforcement learning (RL) is applied to more tasks, there is a need to visualize and understand the behavior of learned agents. Saliency maps explain agent behavior by highlighting the features of the input state that are most relevant for the agent in taking an action. Existing perturbation-based approaches to compute saliency often highlight regions of the input that are not relevant to the action taken by the agent. Our proposed approach, SARFA (Specific and Relevant Feature Attribution), generates more focused saliency maps by balancing two aspects (specificity and relevance) that capture different desiderata of saliency. The first captures the impact of perturbation on the relative expected reward of the action to be explained. The second downweighs irrelevant features that alter the relative expected rewards of actions other than the action to be explained. We compare…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Reinforcement Learning in Robotics · Adversarial Robustness in Machine Learning
