Counterfactual Explanation Policies in RL
Shripad V. Deshmukh, Srivatsan R, Supriti Vijay, Jayakumar, Subramanian, Chirag Agarwal

TL;DR
This paper introduces COUNTERPOL, a novel framework for generating counterfactual explanations in reinforcement learning policies, enabling better interpretability by identifying minimal policy changes for desired outcomes.
Contribution
COUNTERPOL is the first method to systematically analyze RL policies using counterfactual explanations, linking them to trust region optimization techniques.
Findings
Effective in explaining skill (un)learning across diverse environments
Produces minimal policy modifications for targeted outcomes
Demonstrates utility in multiple RL settings
Abstract
As Reinforcement Learning (RL) agents are increasingly employed in diverse decision-making problems using reward preferences, it becomes important to ensure that policies learned by these frameworks in mapping observations to a probability distribution of the possible actions are explainable. However, there is little to no work in the systematic understanding of these complex policies in a contrastive manner, i.e., what minimal changes to the policy would improve/worsen its performance to a desired level. In this work, we present COUNTERPOL, the first framework to analyze RL policies using counterfactual explanations in the form of minimal changes to the policy that lead to the desired outcome. We do so by incorporating counterfactuals in supervised learning in RL with the target outcome regulated using desired return. We establish a theoretical connection between Counterpol and widely…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI)
MethodsCounterfactuals Explanations
