Thompson Sampling for Combinatorial Semi-bandits with Sleeping Arms and   Long-Term Fairness Constraints

Zhiming Huang; Yifan Xu; Bingshan Hu; Qipeng Wang; Jianping Pan

arXiv:2005.06725·cs.LG·May 15, 2020·5 cites

Thompson Sampling for Combinatorial Semi-bandits with Sleeping Arms and Long-Term Fairness Constraints

Zhiming Huang, Yifan Xu, Bingshan Hu, Qipeng Wang, Jianping Pan

PDF

Open Access

TL;DR

This paper introduces TSCSF-B, a Thompson Sampling-based algorithm for combinatorial sleeping semi-bandits with long-term fairness constraints, providing theoretical guarantees and demonstrating effectiveness in a movie recommendation application.

Contribution

The paper develops TSCSF-B, a novel Thompson Sampling algorithm that handles fairness constraints in combinatorial sleeping semi-bandits, with proven regret bounds and practical validation.

Findings

01

TSCSF-B satisfies fairness constraints.

02

Regret bound is tight and problem-independent when fairness is relaxed.

03

Numerical experiments confirm effectiveness in recommendation systems.

Abstract

We study the combinatorial sleeping multi-armed semi-bandit problem with long-term fairness constraints~(CSMAB-F). To address the problem, we adopt Thompson Sampling~(TS) to maximize the total rewards and use virtual queue techniques to handle the fairness constraints, and design an algorithm called \emph{TS with beta priors and Bernoulli likelihoods for CSMAB-F~(TSCSF-B)}. Further, we prove TSCSF-B can satisfy the fairness constraints, and the time-averaged regret is upper bounded by $\frac{N}{2 η} + O (\frac{m N T l n T}{T})$ , where $N$ is the total number of arms, $m$ is the maximum number of arms that can be pulled simultaneously in each round~(the cardinality constraint) and $η$ is the parameter trading off fairness for rewards. By relaxing the fairness constraints (i.e., let $η \to \infty$ ), the bound boils down to the first problem-independent bound…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Mobile Crowdsensing and Crowdsourcing

MethodsSpatio-temporal stability analysis