Redefining Counterfactual Explanations for Reinforcement Learning: Overview, Challenges and Opportunities
Jasmina Gajcin, Ivana Dusparic

TL;DR
This paper reviews counterfactual explanations in supervised learning, identifies challenges in applying them to reinforcement learning, and proposes a redefinition and future research directions for their use in RL systems.
Contribution
It redefines counterfactual explanations specifically for reinforcement learning and outlines key challenges and opportunities for their development and application.
Findings
Counterfactual explanations are well-studied in supervised learning.
Applying counterfactuals to RL faces unique challenges.
The paper proposes a new framework for RL counterfactuals.
Abstract
While AI algorithms have shown remarkable success in various fields, their lack of transparency hinders their application to real-life tasks. Although explanations targeted at non-experts are necessary for user trust and human-AI collaboration, the majority of explanation methods for AI are focused on developers and expert users. Counterfactual explanations are local explanations that offer users advice on what can be changed in the input for the output of the black-box model to change. Counterfactuals are user-friendly and provide actionable advice for achieving the desired output from the AI system. While extensively researched in supervised learning, there are few methods applying them to reinforcement learning (RL). In this work, we explore the reasons for the underrepresentation of a powerful explanation method in RL. We start by reviewing the current work in counterfactual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Data Stream Mining Techniques
MethodsCounterfactuals Explanations
