Thompson Sampling for Combinatorial Semi-bandits with Sleeping Arms and Long-Term Fairness Constraints
Zhiming Huang, Yifan Xu, Bingshan Hu, Qipeng Wang, Jianping Pan

TL;DR
This paper introduces TSCSF-B, a Thompson Sampling-based algorithm for combinatorial sleeping semi-bandits with long-term fairness constraints, providing theoretical guarantees and demonstrating effectiveness in a movie recommendation application.
Contribution
The paper develops TSCSF-B, a novel Thompson Sampling algorithm that handles fairness constraints in combinatorial sleeping semi-bandits, with proven regret bounds and practical validation.
Findings
TSCSF-B satisfies fairness constraints.
Regret bound is tight and problem-independent when fairness is relaxed.
Numerical experiments confirm effectiveness in recommendation systems.
Abstract
We study the combinatorial sleeping multi-armed semi-bandit problem with long-term fairness constraints~(CSMAB-F). To address the problem, we adopt Thompson Sampling~(TS) to maximize the total rewards and use virtual queue techniques to handle the fairness constraints, and design an algorithm called \emph{TS with beta priors and Bernoulli likelihoods for CSMAB-F~(TSCSF-B)}. Further, we prove TSCSF-B can satisfy the fairness constraints, and the time-averaged regret is upper bounded by , where is the total number of arms, is the maximum number of arms that can be pulled simultaneously in each round~(the cardinality constraint) and is the parameter trading off fairness for rewards. By relaxing the fairness constraints (i.e., let ), the bound boils down to the first problem-independent bound…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Mobile Crowdsensing and Crowdsourcing
MethodsSpatio-temporal stability analysis
