Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal   Models

Michael Oberst; David Sontag

arXiv:1905.05824·cs.LG·March 4, 2021·58 cites

Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models

Michael Oberst, David Sontag

PDF

Open Access 1 Repo

TL;DR

This paper presents a novel off-policy evaluation method using Gumbel-Max Structural Causal Models to identify episodes with significant reward differences, aiding debugging in high-stakes RL applications like healthcare.

Contribution

It introduces a new class of SCMs for counterfactual trajectory generation in POMDPs, enabling episode-level analysis of policy differences for safer RL deployment.

Findings

01

Effective identification of episodes with large reward discrepancies

02

Demonstrated utility in a synthetic sepsis management environment

03

Facilitates targeted review by domain experts

Abstract

We introduce an off-policy evaluation procedure for highlighting episodes where applying a reinforcement learned (RL) policy is likely to have produced a substantially different outcome than the observed policy. In particular, we introduce a class of structural causal models (SCMs) for generating counterfactual trajectories in finite partially observable Markov Decision Processes (POMDPs). We see this as a useful procedure for off-policy "debugging" in high-risk settings (e.g., healthcare); by decomposing the expected difference in reward between the RL and observed policy into specific episodes, we can identify episodes where the counterfactual difference in reward is most dramatic. This in turn can be used to facilitate review of specific episodes by domain experts. We demonstrate the utility of this procedure with a synthetic environment of sepsis management.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

clinicalml/gumbel-max-scm
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHealth Systems, Economic Evaluations, Quality of Life · Health Policy Implementation Science