Optimal and Adaptive Off-policy Evaluation in Contextual Bandits

Yu-Xiang Wang; Alekh Agarwal; Miroslav Dudik

arXiv:1612.01205·stat.ML·November 15, 2017·20 cites

Optimal and Adaptive Off-policy Evaluation in Contextual Bandits

Yu-Xiang Wang, Alekh Agarwal, Miroslav Dudik

PDF

Open Access 3 Repos

TL;DR

This paper analyzes the challenge of off-policy evaluation in contextual bandits without reward models, establishing fundamental limits and proposing a new estimator that leverages existing reward models for improved accuracy.

Contribution

It introduces the SWITCH estimator that uses existing reward models to enhance off-policy evaluation, outperforming traditional methods in diverse datasets.

Findings

01

Minimax lower bound on MSE established for agnostic setting

02

Switch estimator achieves better bias-variance tradeoff

03

Empirical results show significant performance improvements

Abstract

We study the off-policy evaluation problem---estimating the value of a target policy using data collected by another policy---under the contextual bandit model. We consider the general (agnostic) setting without access to a consistent model of rewards and establish a minimax lower bound on the mean squared error (MSE). The bound is matched up to constants by the inverse propensity scoring (IPS) and doubly robust (DR) estimators. This highlights the difficulty of the agnostic contextual setting, in contrast with multi-armed bandits and contextual bandits with access to a consistent reward model, where IPS is suboptimal. We then propose the SWITCH estimator, which can use an existing reward model (not necessarily consistent) to achieve a better bias-variance tradeoff than IPS and DR. We prove an upper bound on its MSE and demonstrate its benefits empirically on a diverse collection of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Advanced Causal Inference Techniques · Machine Learning and Algorithms