Benchmarking Perturbation-based Saliency Maps for Explaining Atari Agents
Tobias Huber, Benedikt Limmer, Elisabeth Andr\'e

TL;DR
This paper evaluates and compares five perturbation-based saliency map methods for explaining Deep Reinforcement Learning agents in Atari games, addressing challenges in fidelity assessment and proposing solutions for improved evaluation.
Contribution
It provides a systematic comparison of saliency map approaches for DRL agents, introduces a fix for sanity check issues, and analyzes factors influencing method selection.
Findings
Identified issues with one saliency approach during sanity checks
Proposed a solution to fix sanity check issues
Determined key factors affecting saliency method effectiveness
Abstract
One of the most prominent methods for explaining the behavior of Deep Reinforcement Learning (DRL) agents is the generation of saliency maps that show how much each pixel attributed to the agents' decision. However, there is no work that computationally evaluates and compares the fidelity of different saliency map approaches specifically for DRL agents. It is particularly challenging to computationally evaluate saliency maps for DRL agents since their decisions are part of an overarching policy. For instance, the output neurons of value-based DRL algorithms encode both the value of the current state as well as the value of doing each action in this state. This ambiguity should be considered when evaluating saliency maps for such agents. In this paper, we compare five popular perturbation-based approaches to create saliency maps for DRL agents trained on four different Atari 2600 games.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning
MethodsQ-Learning · Dense Connections · Convolution · Deep Q-Network
