Empirical Likelihood for Contextual Bandits

Nikos Karampatziakis; John Langford; Paul Mineiro

arXiv:1906.03323·cs.LG·October 20, 2020·5 cites

Empirical Likelihood for Contextual Bandits

Nikos Karampatziakis, John Langford, Paul Mineiro

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a new empirical likelihood-based estimator and confidence interval for off-policy evaluation in contextual bandits, enabling more reliable policy value estimation and optimization from limited data.

Contribution

It presents a novel empirical likelihood approach for off-policy evaluation and a policy optimization method based on the lower confidence bound, improving over previous methods.

Findings

01

Estimator and confidence interval outperform previous methods in finite samples

02

The policy optimization algorithm finds policies with higher reward lower bounds

03

Empirical results demonstrate improved off-policy learning performance

Abstract

We propose an estimator and confidence interval for computing the value of a policy from off-policy data in the contextual bandit setting. To this end we apply empirical likelihood techniques to formulate our estimator and confidence interval as simple convex optimization problems. Using the lower bound of our confidence interval, we then propose an off-policy policy optimization algorithm that searches for policies with large reward lower bound. We empirically find that both our estimator and confidence interval improve over previous proposals in finite sample regimes. Finally, the policy optimization algorithm we propose outperforms a strong baseline system for learning from off-policy data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pmineiro/elfcb
noneOfficial

Videos

Empirical Likelihood for Contextual Bandits· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Reinforcement Learning in Robotics