Offline Policy Evaluation and Optimization under Confounding
Chinmaya Kausik, Yangyi Lu, Kevin Tan, Maggie Makar, Yixin Wang, Ambuj, Tewari

TL;DR
This paper addresses the challenge of offline policy evaluation and optimization in Markov Decision Processes with unobserved confounders, proposing algorithms with theoretical guarantees and demonstrating improved performance in simulated environments.
Contribution
It introduces new algorithms for offline policy evaluation and improvement under confounding, with theoretical guarantees and empirical validation in gridworld and healthcare simulations.
Findings
Model-based method provides tighter lower bounds in gridworld.
Algorithms outperform confounder-oblivious benchmarks in sepsis management.
Theoretical analysis of when consistent value estimation is possible.
Abstract
Evaluating and optimizing policies in the presence of unobserved confounders is a problem of growing interest in offline reinforcement learning. Using conventional methods for offline RL in the presence of confounding can not only lead to poor decisions and poor policies, but also have disastrous effects in critical applications such as healthcare and education. We map out the landscape of offline policy evaluation for confounded MDPs, distinguishing assumptions on confounding based on whether they are memoryless and on their effect on the data-collection policies. We characterize settings where consistent value estimates are provably not achievable, and provide algorithms with guarantees to instead estimate lower bounds on the value. When consistent estimates are achievable, we provide algorithms for value estimation with sample complexity guarantees. We also present new algorithms for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHealth Systems, Economic Evaluations, Quality of Life · Sepsis Diagnosis and Treatment · Healthcare Operations and Scheduling Optimization
