p-Mean Regret for Stochastic Bandits

Anand Krishna; Philips George John; Adarsh Barik; Vincent Y. F. Tan

arXiv:2412.10751·cs.LG·December 18, 2024

p-Mean Regret for Stochastic Bandits

Anand Krishna, Philips George John, Adarsh Barik, Vincent Y. F. Tan

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a flexible $p$-mean regret framework for stochastic bandits, providing a unified UCB-based algorithm with new bounds that balance fairness and efficiency across different $p$ values.

Contribution

The work extends $p$-mean welfare to bandit regret, proposing a simple unified algorithm with novel bounds applicable to a range of $p$ values, including Nash regret.

Findings

01

Achieves $p$-mean regret bounds of $ ilde{O}( oot{T^{1/2|p|}})$ for $p eq 0$

02

Matches lower bounds for $0< p extless 1$ up to logarithmic factors

03

Unifies analysis for average and Nash regret with a single algorithm.

Abstract

In this work, we extend the concept of the $p$ -mean welfare objective from social choice theory (Moulin 2004) to study $p$ -mean regret in stochastic multi-armed bandit problems. The $p$ -mean regret, defined as the difference between the optimal mean among the arms and the $p$ -mean of the expected rewards, offers a flexible framework for evaluating bandit algorithms, enabling algorithm designers to balance fairness and efficiency by adjusting the parameter $p$ . Our framework encompasses both average cumulative regret and Nash regret as special cases. We introduce a simple, unified UCB-based algorithm (Explore-Then-UCB) that achieves novel $p$ -mean regret bounds. Our algorithm consists of two phases: a carefully calibrated uniform exploration phase to initialize sample means, followed by the UCB1 algorithm of Auer, Cesa-Bianchi, and Fischer (2002). Under mild assumptions, we prove that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

philips-george/p-mean-regret-stochastic-bandits
noneOfficial

Videos

p-Mean Regret for Stochastic Bandits· underline

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Distributed Sensor Networks and Detection Algorithms