Continuous K-Max Bandits
Yu Chen, Siwei Wang, Longbo Huang, Wei Chen

TL;DR
This paper introduces a novel algorithm for continuous K-Max bandits, addressing challenges like discretization error and partial feedback, and provides regret bounds demonstrating its effectiveness in various settings.
Contribution
The paper proposes DCK-UCB, an efficient algorithm with sublinear regret guarantees for continuous K-Max bandits, including a near-optimal solution for exponential distributions with full feedback.
Findings
DCK-UCB achieves $ ilde{O}(T^{3/4})$ regret for general distributions.
MLE-Exp attains $ ilde{O}( oot{2}{T})$ regret for exponential distributions with full feedback.
First sublinear regret guarantee for continuous K-Max bandits.
Abstract
We study the -Max combinatorial multi-armed bandits problem with continuous outcome distributions and weak value-index feedback: each base arm has an unknown continuous outcome distribution, and in each round the learning agent selects arms, obtains the maximum value sampled from these arms as reward and observes this reward together with the corresponding arm index as feedback. This setting captures critical applications in recommendation systems, distributed computing, server scheduling, etc. The continuous -Max bandits introduce unique challenges, including discretization error from continuous-to-discrete conversion, non-deterministic tie-breaking under limited feedback, and biased estimation due to partial observability. Our key contribution is the computationally efficient algorithm DCK-UCB, which combines adaptive discretization with bias-corrected confidence bounds…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques
MethodsBalanced Selection
