Towards Robust Off-Policy Evaluation via Human Inputs
Harvineet Singh, Shalmali Joshi, Finale Doshi-Velez, Himabindu, Lakkaraju

TL;DR
This paper introduces ROPE, a human-in-the-loop framework for more realistic and less pessimistic off-policy evaluation under dataset shifts, especially in healthcare, by leveraging domain knowledge to focus on plausible environment changes.
Contribution
It proposes a novel framework that incorporates human inputs to restrict the considered dataset shifts in OPE, improving the realism and utility of policy evaluation.
Findings
ROPE effectively captures realistic dataset shifts in healthcare data.
The approach yields less pessimistic and more accurate policy utility estimates.
Algorithms are computationally efficient and backed by theoretical analysis.
Abstract
Off-policy Evaluation (OPE) methods are crucial tools for evaluating policies in high-stakes domains such as healthcare, where direct deployment is often infeasible, unethical, or expensive. When deployment environments are expected to undergo changes (that is, dataset shifts), it is important for OPE methods to perform robust evaluation of the policies amidst such changes. Existing approaches consider robustness against a large class of shifts that can arbitrarily change any observable property of the environment. This often results in highly pessimistic estimates of the utilities, thereby invalidating policies that might have been useful in deployment. In this work, we address the aforementioned problem by investigating how domain knowledge can help provide more realistic estimates of the utilities of policies. We leverage human inputs on which aspects of the environments may…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
