Achieving Fairness in Stochastic Multi-armed Bandit Problem
Vishakha Patil, Ganesh Ghalme, Vineet Nair, Y. Narahari

TL;DR
This paper introduces the Fair-SMAB problem, integrating fairness constraints into stochastic multi-armed bandits, and proposes algorithms with provable fairness guarantees and low regret.
Contribution
It characterizes a class of fairness-aware bandit algorithms with guarantees independent of the learning method used.
Findings
Achieves O(log(T)) r-Regret with UCB1 algorithm.
Provides fairness guarantees that hold uniformly over time.
Analyzes the cost of fairness in traditional regret terms.
Abstract
We study an interesting variant of the stochastic multi-armed bandit problem, called the Fair-SMAB problem, where each arm is required to be pulled for at least a given fraction of the total available rounds. We investigate the interplay between learning and fairness in terms of a pre-specified vector denoting the fractions of guaranteed pulls. We define a fairness-aware regret, called r-Regret, that takes into account the above fairness constraints and naturally extends the conventional notion of regret. Our primary contribution is characterizing a class of Fair-SMAB algorithms by two parameters: the unfairness tolerance and learning algorithm used as a black-box. We provide a fairness guarantee for this class that holds uniformly over time irrespective of the choice of the learning algorithm. In particular, when the learning algorithm is UCB1, we show that our algorithm achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuction Theory and Applications · Advanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques
