A Regret bound for Non-stationary Multi-Armed Bandits with Fairness Constraints
Shaarad A. R, Ambedkar Dukkipati

TL;DR
This paper introduces a new fair algorithm for non-stationary multi-armed bandits that guarantees fairness constraints and achieves sublinear regret, advancing the understanding of fair decision-making in dynamic environments.
Contribution
It proposes the Fair-UCBe algorithm, the first to provide a sublinear regret bound under fairness constraints in non-stationary bandit settings.
Findings
The Fair-UCBe algorithm satisfies fairness constraints.
It achieves a regret bound of O(k^{3/2} T^{1 - α/2} √log T).
Performance approaches stationary case as environment variation decreases.
Abstract
The multi-armed bandits' framework is the most common platform to study strategies for sequential decision-making problems. Recently, the notion of fairness has attracted a lot of attention in the machine learning community. One can impose the fairness condition that at any given point of time, even during the learning phase, a poorly performing candidate should not be preferred over a better candidate. This fairness constraint is known to be one of the most stringent and has been studied in the stochastic multi-armed bandits' framework in a stationary setting for which regret bounds have been established. The main aim of this paper is to study this problem in a non-stationary setting. We present a new algorithm called Fair Upper Confidence Bound with Exploration Fair-UCBe algorithm for solving a slowly varying stochastic -armed bandit problem. With this we present two results: (i)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Stochastic Gradient Optimization Techniques
