Pure Exploration Bandit Problem with General Reward Functions Depending   on Full Distributions

Siwei Wang; Wei Chen

arXiv:2105.03598·cs.LG·May 11, 2021

Pure Exploration Bandit Problem with General Reward Functions Depending on Full Distributions

Siwei Wang, Wei Chen

PDF

Open Access

TL;DR

This paper extends pure exploration bandit models to cases where rewards depend on entire distributions, proposing algorithms with guarantees and analyzing their sample complexity for broader distribution-dependent reward functions.

Contribution

It introduces algorithms for pure exploration bandits with rewards based on full distributions, extending existing frameworks and providing theoretical guarantees.

Findings

01

Algorithms with correctness guarantees for distribution-dependent rewards

02

Sample complexity upper bounds for the proposed algorithms

03

Application discussions for various distribution-based reward functions

Abstract

In this paper, we study the pure exploration bandit model on general distribution functions, which means that the reward function of each arm depends on the whole distribution, not only its mean. We adapt the racing framework and LUCB framework to solve this problem, and design algorithms for estimating the value of the reward functions with different types of distributions. Then we show that our estimation methods have correctness guarantee with proper parameters, and obtain sample complexity upper bounds for them. Finally, we discuss about some important applications and their corresponding solutions under our learning framework.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Reinforcement Learning in Robotics