Off-Policy Evaluation with Policy-Dependent Optimization Response

Wenshuo Guo; Michael I. Jordan; Angela Zhou

arXiv:2202.12958·cs.LG·November 8, 2022·1 cites

Off-Policy Evaluation with Policy-Dependent Optimization Response

Wenshuo Guo, Michael I. Jordan, Angela Zhou

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel framework for off-policy evaluation that accounts for policy-dependent optimization responses, addressing bias issues and enabling causal policy optimization in decision-making scenarios.

Contribution

It develops unbiased estimators for policy-dependent causal outcomes and proposes a general algorithm for optimizing causal interventions.

Findings

01

Constructed unbiased estimators for policy-dependent estimands.

02

Analyzed asymptotic variance properties of estimators.

03

Validated theoretical results with numerical simulations.

Abstract

The intersection of causal inference and machine learning for decision-making is rapidly expanding, but the default decision criterion remains an \textit{average} of individual causal outcomes across a population. In practice, various operational restrictions ensure that a decision-maker's utility is not realized as an \textit{average} but rather as an \textit{output} of a downstream decision-making problem (such as matching, assignment, network flow, minimizing predictive risk). In this work, we develop a new framework for off-policy evaluation with \textit{policy-dependent} linear optimization responses: causal outcomes introduce stochasticity in objective function coefficients. Under this framework, a decision-maker's utility depends on the policy-dependent optimization, which introduces a fundamental challenge of \textit{optimization} bias even for the case of policy evaluation. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Off-Policy Evaluation with Policy-Dependent Optimization Response· slideslive

Taxonomy

TopicsAdvanced Causal Inference Techniques · Distributed Sensor Networks and Detection Algorithms · Statistical Methods and Inference