Revisiting Social Welfare in Bandits: UCB is (Nearly) All You Need

Dhruv Sarkar; Nishant Pandey; Sayak Ray Chowdhury

arXiv:2510.21312·cs.LG·October 27, 2025

Revisiting Social Welfare in Bandits: UCB is (Nearly) All You Need

Dhruv Sarkar, Nishant Pandey, Sayak Ray Chowdhury

PDF

TL;DR

This paper shows that a simple UCB algorithm, with an initial exploration phase, effectively minimizes fairness-aware Nash regret in stochastic bandits, extending to a broad class of fairness metrics with near-optimal guarantees.

Contribution

It demonstrates that a standard UCB algorithm, combined with initial exploration, suffices for near-optimal Nash regret, removing the need for complex, assumption-heavy algorithms.

Findings

01

UCB with exploration achieves near-optimal Nash regret.

02

The approach extends to sub-Gaussian rewards.

03

The method generalizes to p-mean regret with strong guarantees.

Abstract

Regret in stochastic multi-armed bandits traditionally measures the difference between the highest reward and either the arithmetic mean of accumulated rewards or the final reward. These conventional metrics often fail to address fairness among agents receiving rewards, particularly in settings where rewards are distributed across a population, such as patients in clinical trials. To address this, a recent body of work has introduced Nash regret, which evaluates performance via the geometric mean of accumulated rewards, aligning with the Nash social welfare function known for satisfying fairness axioms. To minimize Nash regret, existing approaches require specialized algorithm designs and strong assumptions, such as multiplicative concentration inequalities and bounded, non-negative rewards, making them unsuitable for even Gaussian reward distributions. We demonstrate that an initial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.