Model-Free and Model-Based Policy Evaluation when Causality is Uncertain
David Bruns-Smith

TL;DR
This paper investigates the challenges of policy evaluation in the presence of unobserved confounders, proposing worst-case bounds and leveraging robust MDPs to improve estimates, especially when confounders are persistent.
Contribution
It introduces finite-horizon worst-case bounds for off-policy evaluation under unobserved confounding and demonstrates how model-based approaches with robust MDPs can yield sharper estimates.
Findings
Robust bounds depend on confounder persistence.
Model-based methods outperform naive estimates with domain knowledge.
Persistent confounders make off-policy evaluation significantly more challenging.
Abstract
When decision-makers can directly intervene, policy evaluation algorithms give valid causal estimates. In off-policy evaluation (OPE), there may exist unobserved variables that both impact the dynamics and are used by the unknown behavior policy. These "confounders" will introduce spurious correlations and naive estimates for a new policy will be biased. We develop worst-case bounds to assess sensitivity to these unobserved confounders in finite horizons when confounders are drawn iid each period. We demonstrate that a model-based approach with robust MDPs gives sharper lower bounds by exploiting domain knowledge about the dynamics. Finally, we show that when unobserved confounders are persistent over time, OPE is far more difficult and existing techniques produce extremely conservative bounds.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsBayesian Modeling and Causal Inference · Advanced Causal Inference Techniques · Economic Policies and Impacts
