Combinatorial Pure Exploration with Continuous and Separable Reward   Functions and Its Applications (Extended Version)

Weiran Huang; Jungseul Ok; Liang Li; Wei Chen

arXiv:1805.01685·cs.LG·May 7, 2018·1 cites

Combinatorial Pure Exploration with Continuous and Separable Reward Functions and Its Applications (Extended Version)

Weiran Huang, Jungseul Ok, Liang Li, Wei Chen

PDF

Open Access

TL;DR

This paper introduces a new algorithm for the combinatorial pure exploration problem with continuous, separable reward functions in stochastic bandits, providing bounds on sample complexity and handling non-linear rewards.

Contribution

It proposes an adaptive learning algorithm for CPE-CS, introduces the consistent optimality hardness measure, and establishes upper and lower bounds on sample complexity.

Findings

01

The algorithm achieves near-optimal sample complexity bounds.

02

The hardness measure effectively captures problem difficulty.

03

The method handles non-linear reward functions successfully.

Abstract

We study the Combinatorial Pure Exploration problem with Continuous and Separable reward functions (CPE-CS) in the stochastic multi-armed bandit setting. In a CPE-CS instance, we are given several stochastic arms with unknown distributions, as well as a collection of possible decisions. Each decision has a reward according to the distributions of arms. The goal is to identify the decision with the maximum reward, using as few arm samples as possible. The problem generalizes the combinatorial pure exploration problem with linear rewards, which has attracted significant attention in recent years. In this paper, we propose an adaptive learning algorithm for the CPE-CS problem, and analyze its sample complexity. In particular, we introduce a new hardness measure called the consistent optimality hardness, and give both the upper and lower bounds of sample complexity. Moreover, we give…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics