Nash Regret Guarantees for Linear Bandits
Ayush Sawarni, Soumybrata Pal, and Siddharth Barman

TL;DR
This paper introduces tight bounds for Nash regret in stochastic linear bandits, linking fairness and collective welfare, with algorithms tailored for finite and infinite arm sets under sub-Poisson reward assumptions.
Contribution
It develops a novel algorithm achieving near-optimal Nash regret bounds for linear bandits, incorporating fairness via Nash social welfare and advanced technical tools.
Findings
Achieves Nash regret of O(√(dν/T) log(T|X|)) for finite arm sets.
Provides a bound of O(d^{5/4} ν^{1/2} / √T log(T)) for infinite arm sets.
Results apply to bounded, positive rewards, ensuring broad applicability.
Abstract
We obtain essentially tight upper bounds for a strengthened notion of regret in the stochastic linear bandits framework. The strengthening -- referred to as Nash regret -- is defined as the difference between the (a priori unknown) optimum and the geometric mean of expected rewards accumulated by the linear bandit algorithm. Since the geometric mean corresponds to the well-studied Nash social welfare (NSW) function, this formulation quantifies the performance of a bandit algorithm as the collective welfare it generates across rounds. NSW is known to satisfy fairness axioms and, hence, an upper bound on Nash regret provides a principled fairness guarantee. We consider the stochastic linear bandits problem over a horizon of rounds and with set of arms in ambient dimension . Furthermore, we focus on settings in which the stochastic reward -- associated with each arm in …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Auction Theory and Applications
MethodsFocus
