No-Regret Learning for Fair Multi-Agent Social Welfare Optimization
Mengxiao Zhang, Ramiro Deo-Campo Vuong, Haipeng Luo

TL;DR
This paper investigates the feasibility of no-regret learning for maximizing Nash social welfare in multi-agent online settings, revealing fundamental limits and proposing algorithms with optimal regret bounds under various feedback models.
Contribution
It provides the first comprehensive analysis of no-regret learning for NSW maximization, including tight bounds in stochastic and adversarial environments, and introduces algorithms with optimal regret.
Findings
Achieves near-optimal regret bounds in stochastic multi-armed bandits for NSW.
Proves impossibility of sublinear regret in adversarial reward settings.
Designs algorithms with -regret in full-information feedback scenarios.
Abstract
We consider the problem of online multi-agent Nash social welfare (NSW) maximization. While previous works of Hossain et al. [2021], Jones et al. [2023] study similar problems in stochastic multi-agent multi-armed bandits and show that -regret is possible after rounds, their fairness measure is the product of all agents' rewards, instead of their NSW (that is, their geometric mean). Given the fundamental role of NSW in the fairness literature, it is more than natural to ask whether no-regret fair learning with NSW as the objective is possible. In this work, we provide a complete answer to this question in various settings. Specifically, in stochastic -agent -armed bandits, we develop an algorithm with regret and prove that the dependence on is tight, making it a sharp contrast to the…
Peer Reviews
Decision·NeurIPS 2024 poster
The paper studies an interesting and technically challenging setting. The writing is relatively good, and the authors clearly explain their contributions and the crucial differences between their setting and classical online concave optimization with bandit feedback. I found it particularly interesting that there is a significant discrepancy in the possible regret bounds between online concave optimization and the bandit feedback case of the considered setting. Additionally, I appreciate that th
Despite the paper's solid technical contribution, my only concern lies with the motivation for the setting. The authors briefly mention that the setting has applications in resource allocation but do not provide any concrete examples or a convincing discussion on why this setting is particularly interesting. While I do not doubt that the setting is indeed interesting, I believe a detailed discussion on the potential applications of the model would significantly enhance the paper.
Videos
Taxonomy
TopicsTransportation and Mobility Innovations
