Fairness and Welfare Quantification for Regret in Multi-Armed Bandits
Siddharth Barman, Arindam Khan, Arnab Maiti, Ayush Sawarni

TL;DR
This paper introduces a welfare-based approach to quantify regret in multi-armed bandits using Nash social welfare, proposing algorithms with near-optimal Nash regret guarantees that incorporate fairness considerations.
Contribution
It extends regret analysis to a welfare perspective using Nash social welfare, developing algorithms with tight Nash regret bounds in the multi-armed bandit setting.
Findings
Proposed algorithms achieve Nash regret of O(√(k log T)/T).
Nash regret bounds are nearly tight, matching lower bounds for average regret.
Introduced an anytime algorithm with Nash regret of O(√(k log T)/T log T).
Abstract
We extend the notion of regret with a welfarist perspective. Focussing on the classic multi-armed bandit (MAB) framework, the current work quantifies the performance of bandit algorithms by applying a fundamental welfare function, namely the Nash social welfare (NSW) function. This corresponds to equating algorithm's performance to the geometric mean of its expected rewards and leads us to the study of Nash regret, defined as the difference between the -- a priori unknown -- optimal mean (among the arms) and the algorithm's performance. Since NSW is known to satisfy fairness axioms, our approach complements the utilitarian considerations of average (cumulative) regret, wherein the algorithm is evaluated via the arithmetic mean of its expected rewards. This work develops an algorithm that, given the horizon of play , achieves a Nash regret of $O \left( \sqrt{\frac{{k \log T}}{T}}…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Stochastic Gradient Optimization Techniques
