Pure Exploration Bandit Problem with General Reward Functions Depending on Full Distributions
Siwei Wang, Wei Chen

TL;DR
This paper extends pure exploration bandit models to cases where rewards depend on entire distributions, proposing algorithms with guarantees and analyzing their sample complexity for broader distribution-dependent reward functions.
Contribution
It introduces algorithms for pure exploration bandits with rewards based on full distributions, extending existing frameworks and providing theoretical guarantees.
Findings
Algorithms with correctness guarantees for distribution-dependent rewards
Sample complexity upper bounds for the proposed algorithms
Application discussions for various distribution-based reward functions
Abstract
In this paper, we study the pure exploration bandit model on general distribution functions, which means that the reward function of each arm depends on the whole distribution, not only its mean. We adapt the racing framework and LUCB framework to solve this problem, and design algorithms for estimating the value of the reward functions with different types of distributions. Then we show that our estimation methods have correctness guarantee with proper parameters, and obtain sample complexity upper bounds for them. Finally, we discuss about some important applications and their corresponding solutions under our learning framework.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Reinforcement Learning in Robotics
