SAFE-RL: Saliency-Aware Counterfactual Explainer for Deep Reinforcement Learning Policies
Amir Samadi, Konstantinos Koufos, Kurt Debattista, Mehrdad Dianati

TL;DR
SAFE-RL introduces a saliency-aware counterfactual explanation framework for deep reinforcement learning policies, improving interpretability in safety-critical and high-dimensional environments by generating plausible, minimal, and informative counterfactuals.
Contribution
This work presents a novel saliency-guided method for generating counterfactual explanations in DRL, addressing challenges of high-dimensional inputs and temporal dependencies.
Findings
Outperforms state-of-the-art CF methods in diverse environments
Produces more plausible and informative counterfactuals
Effective across multiple DRL agents and tasks
Abstract
While Deep Reinforcement Learning (DRL) has emerged as a promising solution for intricate control tasks, the lack of explainability of the learned policies impedes its uptake in safety-critical applications, such as automated driving systems (ADS). Counterfactual (CF) explanations have recently gained prominence for their ability to interpret black-box Deep Learning (DL) models. CF examples are associated with minimal changes in the input, resulting in a complementary output by the DL model. Finding such alternations, particularly for high-dimensional visual inputs, poses significant challenges. Besides, the temporal dependency introduced by the reliance of the DRL agent action on a history of past state observations further complicates the generation of CF examples. To address these challenges, we propose using a saliency map to identify the most influential input pixels across the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Ethics and Social Impacts of AI
