Estimating Optimal Policy Value in General Linear Contextual Bandits

Jonathan N. Lee; Weihao Kong; Aldo Pacchiano; Vidya Muthukumar; Emma; Brunskill

arXiv:2302.09451·cs.LG·February 21, 2023

Estimating Optimal Policy Value in General Linear Contextual Bandits

Jonathan N. Lee, Weihao Kong, Aldo Pacchiano, Vidya Muthukumar, Emma, Brunskill

PDF

Open Access

TL;DR

This paper investigates the challenge of estimating the maximum achievable reward in linear contextual bandits before learning the optimal policy, providing new theoretical bounds and practical algorithms for general distributions.

Contribution

It introduces the first sublinear sample complexity algorithms for $V^*$ estimation in general distributions, extending beyond Gaussian covariates, with applications in model selection and treatment effect testing.

Findings

01

Sublinear $ ilde{O}(\sqrt{d})$ estimation of $V^*$ is information-theoretically possible.

02

A practical algorithm estimates a tight, problem-dependent upper bound on $V^*$ for general distributions.

03

The proposed methods improve guarantees in bandit model selection and treatment effect testing.

Abstract

In many bandit problems, the maximal reward achievable by a policy is often unknown in advance. We consider the problem of estimating the optimal policy value in the sublinear data regime before the optimal policy is even learnable. We refer to this as $V^{*}$ estimation. It was recently shown that fast $V^{*}$ estimation is possible but only in disjoint linear bandits with Gaussian covariates. Whether this is possible for more realistic context distributions has remained an open and important question for tasks such as model selection. In this paper, we first provide lower bounds showing that this general problem is hard. However, under stronger assumptions, we give an algorithm and analysis proving that $O (d)$ sublinear estimation of $V^{*}$ is indeed information-theoretically possible, where $d$ is the dimension. We then present a more practical,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reservoir Engineering and Simulation Methods