SAFE-RL: Saliency-Aware Counterfactual Explainer for Deep Reinforcement   Learning Policies

Amir Samadi; Konstantinos Koufos; Kurt Debattista; Mehrdad Dianati

arXiv:2404.18326·cs.LG·April 30, 2024

SAFE-RL: Saliency-Aware Counterfactual Explainer for Deep Reinforcement Learning Policies

Amir Samadi, Konstantinos Koufos, Kurt Debattista, Mehrdad Dianati

PDF

Open Access 1 Repo

TL;DR

SAFE-RL introduces a saliency-aware counterfactual explanation framework for deep reinforcement learning policies, improving interpretability in safety-critical and high-dimensional environments by generating plausible, minimal, and informative counterfactuals.

Contribution

This work presents a novel saliency-guided method for generating counterfactual explanations in DRL, addressing challenges of high-dimensional inputs and temporal dependencies.

Findings

01

Outperforms state-of-the-art CF methods in diverse environments

02

Produces more plausible and informative counterfactuals

03

Effective across multiple DRL agents and tasks

Abstract

While Deep Reinforcement Learning (DRL) has emerged as a promising solution for intricate control tasks, the lack of explainability of the learned policies impedes its uptake in safety-critical applications, such as automated driving systems (ADS). Counterfactual (CF) explanations have recently gained prominence for their ability to interpret black-box Deep Learning (DL) models. CF examples are associated with minimal changes in the input, resulting in a complementary output by the DL model. Finding such alternations, particularly for high-dimensional visual inputs, poses significant challenges. Besides, the temporal dependency introduced by the reliance of the DRL agent action on a history of past state observations further complicates the generation of CF examples. To address these challenges, we propose using a saliency map to identify the most influential input pixels across the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

amir-samadi/safe-rl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Ethics and Social Impacts of AI