Optimal Exploration is no harder than Thompson Sampling
Zhaoqi Li, Kevin Jamieson, Lalit Jain

TL;DR
This paper introduces a new pure exploration linear bandit algorithm that matches the optimal convergence rate of more complex methods while only requiring the same simple computational primitives as Thompson Sampling.
Contribution
The authors propose a novel algorithm that achieves optimal exploration efficiency using only sampling and argmax oracles, simplifying implementation compared to existing methods.
Findings
Achieves exponential convergence rate with optimal exponent.
Performs empirically as well as existing asymptotically optimal methods.
Requires only simple sampling and argmax operations, avoiding costly projections.
Abstract
Given a set of arms and an unknown parameter vector , the pure exploration linear bandit problem aims to return , with high probability through noisy measurements of with . Existing (asymptotically) optimal methods require either a) potentially costly projections for each arm or b) explicitly maintaining a subset of under consideration at each time. This complexity is at odds with the popular and simple Thompson Sampling algorithm for regret minimization, which just requires access to a posterior sampling and argmax oracle, and does not need to enumerate at any point. Unfortunately, Thompson sampling is known to be sub-optimal for pure exploration. In this work, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems
