Explainable Deep Reinforcement Learning: State of the Art and Challenges
George A. Vouros

TL;DR
This paper reviews current methods for making deep reinforcement learning more explainable, addressing the need for transparency and trust in critical real-world applications, and discusses challenges and future directions.
Contribution
It provides a formal framework for explainable deep reinforcement learning and categorizes existing methods, highlighting open challenges and research gaps.
Findings
Categorization of explainable DRL methods by paradigm and explanation surface
Identification of key components for a general explainable DRL framework
Discussion of open challenges and future research directions
Abstract
Interpretability, explainability and transparency are key issues to introducing Artificial Intelligence methods in many critical domains: This is important due to ethical concerns and trust issues strongly connected to reliability, robustness, auditability and fairness, and has important consequences towards keeping the human in the loop in high levels of automation, especially in critical cases for decision making, where both (human and the machine) play important roles. While the research community has given much attention to explainability of closed (or black) prediction boxes, there are tremendous needs for explainability of closed-box methods that support agents to act autonomously in the real world. Reinforcement learning methods, and especially their deep versions, are such closed-box methods. In this article we aim to provide a review of state of the art methods for explainable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
