Explaining Reinforcement Learning with Shapley Values
Daniel Beechey, Thomas M. S. Smith, \"Ozg\"ur \c{S}im\c{s}ek

TL;DR
This paper introduces SVERL, a principled framework using Shapley values to explain reinforcement learning agents, addressing previous limitations and providing meaningful, human-aligned explanations across various domains.
Contribution
It develops a novel theoretical framework, SVERL, for explaining reinforcement learning with Shapley values, and demonstrates its effectiveness in multiple domains.
Findings
SVERL produces explanations that align with human intuition.
The approach exposes limitations of previous Shapley value applications.
SVERL offers meaningful insights into agent performance.
Abstract
For reinforcement learning systems to be widely adopted, their users must understand and trust them. We present a theoretical analysis of explaining reinforcement learning using Shapley values, following a principled approach from game theory for identifying the contribution of individual players to the outcome of a cooperative game. We call this general framework Shapley Values for Explaining Reinforcement Learning (SVERL). Our analysis exposes the limitations of earlier uses of Shapley values in reinforcement learning. We then develop an approach that uses Shapley values to explain agent performance. In a variety of domains, SVERL produces meaningful explanations that match and supplement human intuition.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI)
