Sleeping Combinatorial Bandits
Kumar Abhishek, Ganesh Ghalme, Sujit Gujar, Yadati Narahari

TL;DR
This paper introduces a new algorithm, extCSUCB, for sleeping combinatorial bandits, achieving logarithmic and sublinear regret bounds, and validates its effectiveness through theoretical analysis and experiments.
Contribution
It adapts the CUCB algorithm to sleeping combinatorial bandits and provides regret guarantees under general conditions, extending prior work.
Findings
Achieves $O( extlog T)$ regret in certain settings.
Attains $O( oot 3 extlog T^2)$ regret in the general case.
Validates theoretical results with experiments.
Abstract
In this paper, we study an interesting combination of sleeping and combinatorial stochastic bandits. In the mixed model studied here, at each discrete time instant, an arbitrary \emph{availability set} is generated from a fixed set of \emph{base} arms. An algorithm can select a subset of arms from the \emph{availability set} (sleeping bandits) and receive the corresponding reward along with semi-bandit feedback (combinatorial bandits). We adapt the well-known CUCB algorithm in the sleeping combinatorial bandits setting and refer to it as \CSUCB. We prove -- under mild smoothness conditions -- that the \CSUCB\ algorithm achieves an instance-dependent regret guarantee. We further prove that (i) when the range of the rewards is bounded, the regret guarantee of \CSUCB\ algorithm is and (ii) the instance-independent regret is …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management
