Thompson Sampling for Real-Valued Combinatorial Pure Exploration of   Multi-Armed Bandit

Shintaro Nakamura; Masashi Sugiyama

arXiv:2308.10238·cs.LG·November 16, 2023

Thompson Sampling for Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit

Shintaro Nakamura, Masashi Sugiyama

PDF

Open Access 1 Video

TL;DR

This paper introduces a new algorithm, GenTS-Explore, for the real-valued combinatorial pure exploration in multi-armed bandits, capable of handling exponentially large action sets and achieving near-optimal sample complexity.

Contribution

The paper proposes the first algorithm that efficiently handles exponentially large action sets in R-CPE-MAB and establishes a matching problem-dependent lower bound.

Findings

01

GenTS-Explore works with exponentially large action sets.

02

The algorithm achieves near-optimal sample complexity.

03

A new lower bound for R-CPE-MAB was derived.

Abstract

We study the real-valued combinatorial pure exploration of the multi-armed bandit (R-CPE-MAB) problem. In R-CPE-MAB, a player is given $d$ stochastic arms, and the reward of each arm $s \in {1, \dots, d}$ follows an unknown distribution with mean $μ_{s}$ . In each time step, a player pulls a single arm and observes its reward. The player's goal is to identify the optimal \emph{action} $π^{*} = arg max_{π \in A} μ^{⊤} π$ from a finite-sized real-valued \emph{action set} $A \subset R^{d}$ with as few arm pulls as possible. Previous methods in the R-CPE-MAB assume that the size of the action set $A$ is polynomial in $d$ . We introduce an algorithm named the Generalized Thompson Sampling Explore (GenTS-Explore) algorithm, which is the first algorithm that can work even when the size of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Thompson Sampling for Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit· underline

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems