Calibrated Fairness in Bandits
Yang Liu, Goran Radanovic, Christos Dimitrakakis, Debmalya Mandal,, David C. Parkes

TL;DR
This paper introduces a fairness framework for stochastic multi-armed bandits, proposing a smoothness constraint and fairness regret, with a Thompson sampling variation satisfying fairness and extending to dueling bandits.
Contribution
It adapts fairness concepts to bandits using smoothness constraints and fairness regret, providing a Thompson sampling-based algorithm with theoretical guarantees.
Findings
Thompson sampling variation satisfies smooth fairness for total variation distance.
Achieves an $ ilde{O}((kT)^{2/3})$ fairness regret bound.
Extension to dueling bandit setting demonstrated.
Abstract
We study fairness within the stochastic, \emph{multi-armed bandit} (MAB) decision making framework. We adapt the fairness framework of "treating similar individuals similarly" to this setting. Here, an `individual' corresponds to an arm and two arms are `similar' if they have a similar quality distribution. First, we adopt a {\em smoothness constraint} that if two arms have a similar quality distribution then the probability of selecting each arm should be similar. In addition, we define the {\em fairness regret}, which corresponds to the degree to which an algorithm is not calibrated, where perfect calibration requires that the probability of selecting an arm is equal to the probability with which the arm has the best quality realization. We show that a variation on Thompson sampling satisfies smooth fairness for total variation distance, and give an bound on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Decision-Making and Behavioral Economics
