Worst-Case Regret Bounds for Combinatorial Thompson Sampling in Sleeping Semi-Bandits
Zhiming Huang, Bingshan Hu, Jianping Pan

TL;DR
This paper provides the first worst-case regret bounds for combinatorial Thompson sampling in sleeping semi-bandits, introduces a new variant CL-SG with improved theoretical guarantees, and demonstrates its superior empirical performance.
Contribution
It offers the first worst-case regret analysis for CTS-G in sleeping semi-bandits, proposes CL-SG with better bounds, and validates its effectiveness through experiments.
Findings
Established worst-case regret bounds of ~O(mrac{\sqrt{NT}}) for CTS-G.
Proposed CL-SG achieves regret bounds of ~O(rac{\,rac{rac{rac{rac{rac{rac{rac{rac{rac{rac{rac{",
Experimental results show CL-SG outperforms CTS-G and CTS-B on real datasets.
Abstract
We revisit combinatorial Thompson sampling (CTS) for semi-bandits with sleeping arms, where arm availability varies over time and actions must satisfy combinatorial constraints, as in wireless mesh routing with fluctuating link availability. Despite its practical relevance, CTS has been hindered by several long-standing problems: (i) the absence of worst-case regret guarantees in the semi-bandit setting even without sleeping arms, (ii) the lack of theory under adversarially varying availability, and (iii) the consistently weak empirical performance of CTS with Gaussian priors (CTS-G). This paper resolves these long-standing issues by providing the first worst-case regret analysis of CTS-G, proving an upper bound of and a matching lower bound of . To bridge the gap between theory and practice, we further propose CL-SG, a simple CTS-G…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
