Causal policy ranking

Daniel McNamee; Hana Chockler

arXiv:2111.08415·cs.AI·November 17, 2021

Causal policy ranking

Daniel McNamee, Hana Chockler

PDF

Open Access

TL;DR

This paper introduces a black-box causal method to rank decisions in reinforcement learning policies based on their direct impact on rewards, enhancing interpretability of complex RL policies.

Contribution

It proposes a novel counterfactual reasoning approach for causal policy ranking in RL, providing a new way to interpret decision importance.

Findings

01

Causal ranking correlates with reward contribution.

02

Causal method outperforms non-causal ranking in interpretability.

03

Preliminary results show promise for causal interpretability in RL.

Abstract

Policies trained via reinforcement learning (RL) are often very complex even for simple tasks. In an episode with $n$ time steps, a policy will make $n$ decisions on actions to take, many of which may appear non-intuitive to the observer. Moreover, it is not clear which of these decisions directly contribute towards achieving the reward and how significant is their contribution. Given a trained policy, we propose a black-box method based on counterfactual reasoning that estimates the causal effect that these decisions have on reward attainment and ranks the decisions according to this estimate. In this preliminary work, we compare our measure against an alternative, non-causal, ranking procedure, highlight the benefits of causality-based policy ranking, and discuss potential future work integrating causal algorithms into the interpretation of RL agent policies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Neural and Behavioral Psychology Studies