Explaining Reinforcement Learning: A Counterfactual Shapley Values Approach
Yiwei Shi, Qi Zhang, Kevin McAreavey, Weiru Liu

TL;DR
This paper presents a new explainability method for reinforcement learning called Counterfactual Shapley Values, which uses counterfactual analysis to better understand how different state features influence action choices.
Contribution
The paper introduces novel characteristic value functions for calculating Shapley values in RL, enhancing interpretability by quantifying feature contributions to decisions.
Findings
CSV improves transparency in RL models
Effective across multiple RL domains
Quantifies differences between optimal and non-optimal actions
Abstract
This paper introduces a novel approach Counterfactual Shapley Values (CSV), which enhances explainability in reinforcement learning (RL) by integrating counterfactual analysis with Shapley Values. The approach aims to quantify and compare the contributions of different state dimensions to various action choices. To more accurately analyze these impacts, we introduce new characteristic value functions, the ``Counterfactual Difference Characteristic Value" and the ``Average Counterfactual Difference Characteristic Value." These functions help calculate the Shapley values to evaluate the differences in contributions between optimal and non-optimal actions. Experiments across several RL domains, such as GridWorld, FrozenLake, and Taxi, demonstrate the effectiveness of the CSV method. The results show that this method not only improves transparency in complex RL systems but also quantifies…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Law, Economics, and Judicial Systems
