A Regret bound for Non-stationary Multi-Armed Bandits with Fairness   Constraints

Shaarad A. R; Ambedkar Dukkipati

arXiv:2012.13380·cs.LG·December 25, 2020

A Regret bound for Non-stationary Multi-Armed Bandits with Fairness Constraints

Shaarad A. R, Ambedkar Dukkipati

PDF

Open Access

TL;DR

This paper introduces a new fair algorithm for non-stationary multi-armed bandits that guarantees fairness constraints and achieves sublinear regret, advancing the understanding of fair decision-making in dynamic environments.

Contribution

It proposes the Fair-UCBe algorithm, the first to provide a sublinear regret bound under fairness constraints in non-stationary bandit settings.

Findings

01

The Fair-UCBe algorithm satisfies fairness constraints.

02

It achieves a regret bound of O(k^{3/2} T^{1 - α/2} √log T).

03

Performance approaches stationary case as environment variation decreases.

Abstract

The multi-armed bandits' framework is the most common platform to study strategies for sequential decision-making problems. Recently, the notion of fairness has attracted a lot of attention in the machine learning community. One can impose the fairness condition that at any given point of time, even during the learning phase, a poorly performing candidate should not be preferred over a better candidate. This fairness constraint is known to be one of the most stringent and has been studied in the stochastic multi-armed bandits' framework in a stationary setting for which regret bounds have been established. The main aim of this paper is to study this problem in a non-stationary setting. We present a new algorithm called Fair Upper Confidence Bound with Exploration Fair-UCBe algorithm for solving a slowly varying stochastic $k$ -armed bandit problem. With this we present two results: (i)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Stochastic Gradient Optimization Techniques