An Efficient Algorithm for Fair Multi-Agent Multi-Armed Bandit with Low Regret
Matthew Jones, Huy L\^e Nguyen, Thy Nguyen

TL;DR
This paper introduces a new efficient multi-agent multi-armed bandit algorithm that optimizes fairness via Nash social welfare with significantly lower regret, outperforming previous methods both theoretically and experimentally.
Contribution
The paper presents a novel efficient algorithm for fair multi-agent bandits with improved regret bounds, advancing the state-of-the-art in fairness-aware online learning.
Findings
The efficient algorithm achieves a regret of (\u221a{NKT} + NK)
Experimental results show the new algorithm outperforms previous approaches
The paper also provides a less efficient method with different regret bounds.
Abstract
Recently a multi-agent variant of the classical multi-armed bandit was proposed to tackle fairness issues in online learning. Inspired by a long line of work in social choice and economics, the goal is to optimize the Nash social welfare instead of the total utility. Unfortunately previous algorithms either are not efficient or achieve sub-optimal regret in terms of the number of rounds . We propose a new efficient algorithm with lower regret than even previous inefficient ones. For agents, arms, and rounds, our approach has a regret bound of . This is an improvement to the previous approach, which has regret bound of . We also complement our efficient algorithm with an inefficient approach with regret. The experimental findings confirm the effectiveness of our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Optimization and Search Problems
