Quantile Off-Policy Evaluation via Deep Conditional Generative Learning

Yang Xu; Chengchun Shi; Shikai Luo; Lan Wang; and Rui Song

arXiv:2212.14466·stat.ML·January 2, 2023

Quantile Off-Policy Evaluation via Deep Conditional Generative Learning

Yang Xu, Chengchun Shi, Shikai Luo, Lan Wang, and Rui Song

PDF

Open Access

TL;DR

This paper introduces a novel doubly-robust method for quantile off-policy evaluation in sequential decision making, leveraging deep generative models to better handle skewed reward distributions and provide more robust policy evaluation.

Contribution

It develops a new quantile-focused OPE estimator using deep conditional generative learning, addressing variability and heavy tails in reward distributions.

Findings

01

Outperforms classical mean-based OPE estimators in heavy-tailed settings

02

Demonstrates effectiveness on real-world short-video platform data

03

Provides asymptotic guarantees for the proposed estimator

Abstract

Off-Policy evaluation (OPE) is concerned with evaluating a new target policy using offline data generated by a potentially different behavior policy. It is critical in a number of sequential decision making problems ranging from healthcare to technology industries. Most of the work in existing literature is focused on evaluating the mean outcome of a given policy, and ignores the variability of the outcome. However, in a variety of applications, criteria other than the mean may be more sensible. For example, when the reward distribution is skewed and asymmetric, quantile-based metrics are often preferred for their robustness. In this paper, we propose a doubly-robust inference procedure for quantile OPE in sequential decision making and study its asymptotic properties. In particular, we propose utilizing state-of-the-art deep conditional generative learning methods to handle…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHealth Systems, Economic Evaluations, Quality of Life