A Theoretical Framework for Explaining Reinforcement Learning with Shapley Values

Daniel Beechey; Thomas M. S. Smith; \"Ozg\"ur \c{S}im\c{s}ek

arXiv:2505.07797·cs.LG·August 1, 2025

A Theoretical Framework for Explaining Reinforcement Learning with Shapley Values

Daniel Beechey, Thomas M. S. Smith, \"Ozg\"ur \c{S}im\c{s}ek

PDF

Open Access

TL;DR

This paper introduces SVERL, a theoretical framework using Shapley values to explain reinforcement learning agents' behaviour, outcomes, and predictions with mathematically justified, interpretable explanations.

Contribution

It develops a unified, axiomatic approach for explaining RL agents through feature influence, addressing interpretability and conceptual clarity issues.

Findings

01

SVERL provides precise, interpretable explanations of RL agents.

02

The framework identifies and corrects conceptual issues in prior explanations.

03

Illustrative examples demonstrate the usefulness of SVERL in understanding agent behaviour.

Abstract

Reinforcement learning agents can achieve super-human performance in complex decision-making tasks, but their behaviour is often difficult to understand and explain. This lack of explanation limits deployment, especially in safety-critical settings where understanding and trust are essential. We identify three core explanatory targets that together provide a comprehensive view of reinforcement learning agents: behaviour, outcomes, and predictions. We develop a unified theoretical framework for explaining these three elements of reinforcement learning agents through the influence of individual features that the agent observes in its environment. We derive feature influences by using Shapley values, which collectively and uniquely satisfy a set of well-motivated axioms for fair and consistent credit assignment. The proposed approach, Shapley Values for Explaining Reinforcement Learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Reinforcement Learning in Robotics · Adversarial Robustness in Machine Learning

MethodsALIGN