Nash Regret Guarantees for Linear Bandits

Ayush Sawarni; Soumybrata Pal; and Siddharth Barman

arXiv:2310.02023·cs.LG·October 4, 2023

Nash Regret Guarantees for Linear Bandits

Ayush Sawarni, Soumybrata Pal, and Siddharth Barman

PDF

Open Access

TL;DR

This paper introduces tight bounds for Nash regret in stochastic linear bandits, linking fairness and collective welfare, with algorithms tailored for finite and infinite arm sets under sub-Poisson reward assumptions.

Contribution

It develops a novel algorithm achieving near-optimal Nash regret bounds for linear bandits, incorporating fairness via Nash social welfare and advanced technical tools.

Findings

01

Achieves Nash regret of O(√(dν/T) log(T|X|)) for finite arm sets.

02

Provides a bound of O(d^{5/4} ν^{1/2} / √T log(T)) for infinite arm sets.

03

Results apply to bounded, positive rewards, ensuring broad applicability.

Abstract

We obtain essentially tight upper bounds for a strengthened notion of regret in the stochastic linear bandits framework. The strengthening -- referred to as Nash regret -- is defined as the difference between the (a priori unknown) optimum and the geometric mean of expected rewards accumulated by the linear bandit algorithm. Since the geometric mean corresponds to the well-studied Nash social welfare (NSW) function, this formulation quantifies the performance of a bandit algorithm as the collective welfare it generates across rounds. NSW is known to satisfy fairness axioms and, hence, an upper bound on Nash regret provides a principled fairness guarantee. We consider the stochastic linear bandits problem over a horizon of $T$ rounds and with set of arms $X$ in ambient dimension $d$ . Furthermore, we focus on settings in which the stochastic reward -- associated with each arm in $X$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Auction Theory and Applications

MethodsFocus