From Explainability to Interpretability: Interpretable Policies in   Reinforcement Learning Via Model Explanation

Peilang Li; Umer Siddique; Yongcan Cao

arXiv:2501.09858·cs.LG·January 20, 2025

From Explainability to Interpretability: Interpretable Policies in Reinforcement Learning Via Model Explanation

Peilang Li, Umer Siddique, Yongcan Cao

PDF

Open Access

TL;DR

This paper introduces a model-agnostic method using Shapley values to transform deep reinforcement learning policies into interpretable, transparent models, enhancing understanding without sacrificing performance.

Contribution

It presents a novel, general framework for global interpretability of deep RL policies using Shapley values, applicable to various algorithms and environments.

Findings

01

Preserves original model performance

02

Produces more stable interpretable policies

03

Validates approach on classic control environments

Abstract

Deep reinforcement learning (RL) has shown remarkable success in complex domains, however, the inherent black box nature of deep neural network policies raises significant challenges in understanding and trusting the decision-making processes. While existing explainable RL methods provide local insights, they fail to deliver a global understanding of the model, particularly in high-stakes applications. To overcome this limitation, we propose a novel model-agnostic approach that bridges the gap between explainability and interpretability by leveraging Shapley values to transform complex deep RL policies into transparent representations. The proposed approach offers two key contributions: a novel approach employing Shapley values to policy interpretation beyond local explanations and a general framework applicable to off-policy and on-policy algorithms. We evaluate our approach with three…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI)