A Survey on Explainable Deep Reinforcement Learning

Zelei Cheng; Jiahao Yu; Xinyu Xing

arXiv:2502.06869·cs.LG·February 12, 2025

A Survey on Explainable Deep Reinforcement Learning

Zelei Cheng, Jiahao Yu, Xinyu Xing

PDF

Open Access

TL;DR

This survey reviews explainable deep reinforcement learning methods, their evaluation, and integration with large language models, aiming to improve transparency, trust, and safety in AI decision-making systems.

Contribution

It provides a comprehensive overview of XRL techniques, assessment frameworks, and explores the integration of RL with LLMs like RLHF for better AI alignment.

Findings

01

XRL enhances transparency at multiple levels

02

Evaluation frameworks for XRL are established

03

Integration of RL with LLMs improves AI alignment

Abstract

Deep Reinforcement Learning (DRL) has achieved remarkable success in sequential decision-making tasks across diverse domains, yet its reliance on black-box neural architectures hinders interpretability, trust, and deployment in high-stakes applications. Explainable Deep Reinforcement Learning (XRL) addresses these challenges by enhancing transparency through feature-level, state-level, dataset-level, and model-level explanation techniques. This survey provides a comprehensive review of XRL methods, evaluates their qualitative and quantitative assessment frameworks, and explores their role in policy refinement, adversarial robustness, and security. Additionally, we examine the integration of reinforcement learning with Large Language Models (LLMs), particularly through Reinforcement Learning from Human Feedback (RLHF), which optimizes AI alignment with human preferences. We conclude by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Machine Learning in Healthcare