Causal policy ranking
Daniel McNamee, Hana Chockler

TL;DR
This paper introduces a black-box causal method to rank decisions in reinforcement learning policies based on their direct impact on rewards, enhancing interpretability of complex RL policies.
Contribution
It proposes a novel counterfactual reasoning approach for causal policy ranking in RL, providing a new way to interpret decision importance.
Findings
Causal ranking correlates with reward contribution.
Causal method outperforms non-causal ranking in interpretability.
Preliminary results show promise for causal interpretability in RL.
Abstract
Policies trained via reinforcement learning (RL) are often very complex even for simple tasks. In an episode with time steps, a policy will make decisions on actions to take, many of which may appear non-intuitive to the observer. Moreover, it is not clear which of these decisions directly contribute towards achieving the reward and how significant is their contribution. Given a trained policy, we propose a black-box method based on counterfactual reasoning that estimates the causal effect that these decisions have on reward attainment and ranks the decisions according to this estimate. In this preliminary work, we compare our measure against an alternative, non-causal, ranking procedure, highlight the benefits of causality-based policy ranking, and discuss potential future work integrating causal algorithms into the interpretation of RL agent policies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Neural and Behavioral Psychology Studies
