Estimating Optimal Policy Value in General Linear Contextual Bandits
Jonathan N. Lee, Weihao Kong, Aldo Pacchiano, Vidya Muthukumar, Emma, Brunskill

TL;DR
This paper investigates the challenge of estimating the maximum achievable reward in linear contextual bandits before learning the optimal policy, providing new theoretical bounds and practical algorithms for general distributions.
Contribution
It introduces the first sublinear sample complexity algorithms for $V^*$ estimation in general distributions, extending beyond Gaussian covariates, with applications in model selection and treatment effect testing.
Findings
Sublinear $ ilde{O}(\sqrt{d})$ estimation of $V^*$ is information-theoretically possible.
A practical algorithm estimates a tight, problem-dependent upper bound on $V^*$ for general distributions.
The proposed methods improve guarantees in bandit model selection and treatment effect testing.
Abstract
In many bandit problems, the maximal reward achievable by a policy is often unknown in advance. We consider the problem of estimating the optimal policy value in the sublinear data regime before the optimal policy is even learnable. We refer to this as estimation. It was recently shown that fast estimation is possible but only in disjoint linear bandits with Gaussian covariates. Whether this is possible for more realistic context distributions has remained an open and important question for tasks such as model selection. In this paper, we first provide lower bounds showing that this general problem is hard. However, under stronger assumptions, we give an algorithm and analysis proving that sublinear estimation of is indeed information-theoretically possible, where is the dimension. We then present a more practical,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reservoir Engineering and Simulation Methods
