No-Regret Learning for Fair Multi-Agent Social Welfare Optimization

Mengxiao Zhang; Ramiro Deo-Campo Vuong; Haipeng Luo

arXiv:2405.20678·cs.LG·June 3, 2024

No-Regret Learning for Fair Multi-Agent Social Welfare Optimization

Mengxiao Zhang, Ramiro Deo-Campo Vuong, Haipeng Luo

PDF

Open Access 1 Video 1 Reviews

TL;DR

This paper investigates the feasibility of no-regret learning for maximizing Nash social welfare in multi-agent online settings, revealing fundamental limits and proposing algorithms with optimal regret bounds under various feedback models.

Contribution

It provides the first comprehensive analysis of no-regret learning for NSW maximization, including tight bounds in stochastic and adversarial environments, and introduces algorithms with optimal regret.

Findings

01

Achieves near-optimal regret bounds in stochastic multi-armed bandits for NSW.

02

Proves impossibility of sublinear regret in adversarial reward settings.

03

Designs algorithms with -regret in full-information feedback scenarios.

Abstract

We consider the problem of online multi-agent Nash social welfare (NSW) maximization. While previous works of Hossain et al. [2021], Jones et al. [2023] study similar problems in stochastic multi-agent multi-armed bandits and show that $T$ -regret is possible after $T$ rounds, their fairness measure is the product of all agents' rewards, instead of their NSW (that is, their geometric mean). Given the fundamental role of NSW in the fairness literature, it is more than natural to ask whether no-regret fair learning with NSW as the objective is possible. In this work, we provide a complete answer to this question in various settings. Specifically, in stochastic $N$ -agent $K$ -armed bandits, we develop an algorithm with $O (K^{\frac{2}{N}} T^{\frac{N - 1}{N}})$ regret and prove that the dependence on $T$ is tight, making it a sharp contrast to the…

Peer Reviews

Decision·NeurIPS 2024 poster

Reviewer 01Rating 6Confidence 3

Strengths

The paper studies an interesting and technically challenging setting. The writing is relatively good, and the authors clearly explain their contributions and the crucial differences between their setting and classical online concave optimization with bandit feedback. I found it particularly interesting that there is a significant discrepancy in the possible regret bounds between online concave optimization and the bandit feedback case of the considered setting. Additionally, I appreciate that th

Weaknesses

Despite the paper's solid technical contribution, my only concern lies with the motivation for the setting. The authors briefly mention that the setting has applications in resource allocation but do not provide any concrete examples or a convincing discussion on why this setting is particularly interesting. While I do not doubt that the setting is indeed interesting, I believe a detailed discussion on the potential applications of the model would significantly enhance the paper.

Videos

No-Regret Learning for Fair Multi-Agent Social Welfare Optimization· slideslive

Taxonomy

TopicsTransportation and Mobility Innovations