Optimal Exploration is no harder than Thompson Sampling

Zhaoqi Li; Kevin Jamieson; Lalit Jain

arXiv:2310.06069·stat.ML·October 26, 2023·2 cites

Optimal Exploration is no harder than Thompson Sampling

Zhaoqi Li, Kevin Jamieson, Lalit Jain

PDF

Open Access

TL;DR

This paper introduces a new pure exploration linear bandit algorithm that matches the optimal convergence rate of more complex methods while only requiring the same simple computational primitives as Thompson Sampling.

Contribution

The authors propose a novel algorithm that achieves optimal exploration efficiency using only sampling and argmax oracles, simplifying implementation compared to existing methods.

Findings

01

Achieves exponential convergence rate with optimal exponent.

02

Performs empirically as well as existing asymptotically optimal methods.

03

Requires only simple sampling and argmax operations, avoiding costly projections.

Abstract

Given a set of arms $Z \subset R^{d}$ and an unknown parameter vector $θ_{*} \in R^{d}$ , the pure exploration linear bandit problem aims to return $ar g max_{z \in Z} z^{⊤} θ_{*}$ , with high probability through noisy measurements of $x^{⊤} θ_{*}$ with $x \in X \subset R^{d}$ . Existing (asymptotically) optimal methods require either a) potentially costly projections for each arm $z \in Z$ or b) explicitly maintaining a subset of $Z$ under consideration at each time. This complexity is at odds with the popular and simple Thompson Sampling algorithm for regret minimization, which just requires access to a posterior sampling and argmax oracle, and does not need to enumerate $Z$ at any point. Unfortunately, Thompson sampling is known to be sub-optimal for pure exploration. In this work, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems