Importance-Weighted Offline Learning Done Right

Germano Gabbianelli; Gergely Neu; Matteo Papini

arXiv:2309.15771·cs.LG·September 28, 2023

Importance-Weighted Offline Learning Done Right

Germano Gabbianelli, Gergely Neu, Matteo Papini

PDF

Open Access

TL;DR

This paper introduces an improved importance-weighted offline learning method for stochastic contextual bandits, removing restrictive assumptions and achieving superior theoretical guarantees through a novel estimator and tail analysis.

Contribution

It presents a simple alternative estimator based on implicit exploration that outperforms previous methods and removes the uniform coverage assumption, extending results to infinite policy classes.

Findings

01

Superior performance guarantees over previous methods

02

Removal of the uniform coverage assumption

03

Robustness demonstrated through numerical simulations

Abstract

We study the problem of offline policy optimization in stochastic contextual bandit problems, where the goal is to learn a near-optimal policy based on a dataset of decision data collected by a suboptimal behavior policy. Rather than making any structural assumptions on the reward function, we assume access to a given policy class and aim to compete with the best comparator policy within this class. In this setting, a standard approach is to compute importance-weighted estimators of the value of each policy, and select a policy that minimizes the estimated value up to a "pessimistic" adjustment subtracted from the estimates to reduce their random fluctuations. In this paper, we show that a simple alternative approach based on the "implicit exploration" estimator of \citet{Neu2015} yields performance guarantees that are superior in nearly all possible terms to all previous results. Most…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Machine Learning and Algorithms