Learning "What-if" Explanations for Sequential Decision-Making
Ioana Bica, Daniel Jarrett, Alihan H\"uy\"uk, Mihaela van der Schaar

TL;DR
This paper introduces a method to interpret expert decision-making by modeling their reward functions through counterfactual reasoning, enabling explanations of actions in sequential settings without active experimentation.
Contribution
It proposes a novel batch inverse reinforcement learning approach that incorporates counterfactuals to explain expert behavior in sequential decision-making tasks.
Findings
Effective in real and simulated medical environments
Accurately recovers interpretable behavior descriptions
Handles off-policy evaluation with counterfactual reasoning
Abstract
Building interpretable parameterizations of real-world decision-making on the basis of demonstrated behavior -- i.e. trajectories of observations and actions made by an expert maximizing some unknown reward function -- is essential for introspecting and auditing policies in different institutions. In this paper, we propose learning explanations of expert decisions by modeling their reward function in terms of preferences with respect to "what if" outcomes: Given the current history of observations, what would happen if we took a particular action? To learn these cost-benefit tradeoffs associated with the expert's actions, we integrate counterfactual reasoning into batch inverse reinforcement learning. This offers a principled way of defining reward functions and explaining expert behavior, and also satisfies the constraints of real-world decision-making -- where active experimentation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Causal Inference Techniques · Explainable Artificial Intelligence (XAI) · Health Systems, Economic Evaluations, Quality of Life
MethodsCounterfactuals Explanations
