Thompson Sampling for Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit
Shintaro Nakamura, Masashi Sugiyama

TL;DR
This paper introduces a new algorithm, GenTS-Explore, for the real-valued combinatorial pure exploration in multi-armed bandits, capable of handling exponentially large action sets and achieving near-optimal sample complexity.
Contribution
The paper proposes the first algorithm that efficiently handles exponentially large action sets in R-CPE-MAB and establishes a matching problem-dependent lower bound.
Findings
GenTS-Explore works with exponentially large action sets.
The algorithm achieves near-optimal sample complexity.
A new lower bound for R-CPE-MAB was derived.
Abstract
We study the real-valued combinatorial pure exploration of the multi-armed bandit (R-CPE-MAB) problem. In R-CPE-MAB, a player is given stochastic arms, and the reward of each arm follows an unknown distribution with mean . In each time step, a player pulls a single arm and observes its reward. The player's goal is to identify the optimal \emph{action} from a finite-sized real-valued \emph{action set} with as few arm pulls as possible. Previous methods in the R-CPE-MAB assume that the size of the action set is polynomial in . We introduce an algorithm named the Generalized Thompson Sampling Explore (GenTS-Explore) algorithm, which is the first algorithm that can work even when the size of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems
