TL;DR
This paper introduces RewardFairUCB, an algorithm for multi-agent multi-armed bandits that balances maximizing social welfare with ensuring a minimum reward guarantee for fairness, achieving sublinear regret bounds.
Contribution
The paper proposes RewardFairUCB, a novel UCB-based algorithm that guarantees fairness while optimizing social welfare in multi-agent bandit settings, with proven regret bounds.
Findings
RewardFairUCB achieves $ ilde{O}(T^{1/2})$ social welfare regret.
RewardFairUCB attains $ ilde{O}(T^{3/4})$ fairness regret.
Lower bounds of $ ext{Ω}( oot{2} ext{T})$ for both regrets.
Abstract
We investigate the problem of maximizing social welfare while ensuring fairness in a multi-agent multi-armed bandit (MA-MAB) setting. In this problem, a centralized decision-maker takes actions over time, generating random rewards for various agents. Our goal is to maximize the sum of expected cumulative rewards, a.k.a. social welfare, while ensuring that each agent receives an expected reward that is at least a constant fraction of the maximum possible expected reward. Our proposed algorithm, RewardFairUCB, leverages the Upper Confidence Bound (UCB) technique to achieve sublinear regret bounds for both fairness and social welfare. The fairness regret measures the positive difference between the minimum reward guarantee and the expected reward of a given policy, whereas the social welfare regret measures the difference between the social welfare of the optimal fair policy and that of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
