A Fast Algorithm for the Real-Valued Combinatorial Pure Exploration of   Multi-Armed Bandit

Shintaro Nakamura; Masashi Sugiyama

arXiv:2306.09202·cs.LG·January 10, 2025·1 cites

A Fast Algorithm for the Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit

Shintaro Nakamura, Masashi Sugiyama

PDF

Open Access

TL;DR

This paper introduces a new combinatorial gap-based exploration algorithm for the real-valued pure exploration problem in stochastic multi-armed bandits, achieving near-optimal sample complexity and outperforming existing methods.

Contribution

The paper presents the CombGapE algorithm, which efficiently solves the R-CPE-MAB problem with polynomial action set size, matching lower bounds up to constants.

Findings

01

CombGapE algorithm has near-optimal sample complexity.

02

Outperforms existing methods in synthetic and real datasets.

03

Applicable to polynomial-sized action sets in multi-armed bandits.

Abstract

We study the real-valued combinatorial pure exploration problem in the stochastic multi-armed bandit (R-CPE-MAB). We study the case where the size of the action set is polynomial with respect to the number of arms. In such a case, the R-CPE-MAB can be seen as a special case of the so-called transductive linear bandits. We introduce an algorithm named the combinatorial gap-based exploration (CombGapE) algorithm, whose sample complexity upper bound matches the lower bound up to a problem-dependent constant factor. We numerically show that the CombGapE algorithm outperforms existing methods significantly in both synthetic and real-world datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Optimization and Search Problems

MethodsCollaborative Preference Embedding · Focus