Continuous K-Max Bandits

Yu Chen; Siwei Wang; Longbo Huang; Wei Chen

arXiv:2502.13467·cs.LG·February 20, 2025

Continuous K-Max Bandits

Yu Chen, Siwei Wang, Longbo Huang, Wei Chen

PDF

Open Access

TL;DR

This paper introduces a novel algorithm for continuous K-Max bandits, addressing challenges like discretization error and partial feedback, and provides regret bounds demonstrating its effectiveness in various settings.

Contribution

The paper proposes DCK-UCB, an efficient algorithm with sublinear regret guarantees for continuous K-Max bandits, including a near-optimal solution for exponential distributions with full feedback.

Findings

01

DCK-UCB achieves $ ilde{O}(T^{3/4})$ regret for general distributions.

02

MLE-Exp attains $ ilde{O}( oot{2}{T})$ regret for exponential distributions with full feedback.

03

First sublinear regret guarantee for continuous K-Max bandits.

Abstract

We study the $K$ -Max combinatorial multi-armed bandits problem with continuous outcome distributions and weak value-index feedback: each base arm has an unknown continuous outcome distribution, and in each round the learning agent selects $K$ arms, obtains the maximum value sampled from these $K$ arms as reward and observes this reward together with the corresponding arm index as feedback. This setting captures critical applications in recommendation systems, distributed computing, server scheduling, etc. The continuous $K$ -Max bandits introduce unique challenges, including discretization error from continuous-to-discrete conversion, non-deterministic tie-breaking under limited feedback, and biased estimation due to partial observability. Our key contribution is the computationally efficient algorithm DCK-UCB, which combines adaptive discretization with bias-corrected confidence bounds…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques

MethodsBalanced Selection