Maximal Objectives in the Multi-armed Bandit with Applications

Eren Ozbay; Vijay Kamble

arXiv:2006.06853·cs.LG·October 16, 2024

Maximal Objectives in the Multi-armed Bandit with Applications

Eren Ozbay, Vijay Kamble

PDF

Open Access

TL;DR

This paper introduces a new objective for the multi-armed bandit problem focused on maximizing the highest total reward among arms, providing theoretical regret bounds and an adaptive policy, with applications to online platform participant management.

Contribution

It proposes a novel 'max' objective for multi-armed bandits, derives regret bounds, and develops an adaptive policy that outperforms natural alternatives in practical scenarios.

Findings

01

Theoretical regret bounds of () () for the max objective.

02

An adaptive explore-then-commit policy achieves near-optimal regret bounds.

03

Numerical experiments show the policy's effectiveness over alternatives.

Abstract

In several applications of the stochastic multi-armed bandit problem, the traditional objective of maximizing the expected total reward can be inappropriate. In this paper, motivated by certain operational concerns in online platforms, we consider a new objective in the classical setup. Given $K$ arms, instead of maximizing the expected total reward from $T$ pulls (the traditional "sum" objective), we consider the vector of total rewards earned from each of the $K$ arms at the end of $T$ pulls and aim to maximize the expected highest total reward across arms (the "max" objective). For this objective, we show that any policy must incur an instance-dependent asymptotic regret of $Ω (lo g T)$ (with a higher instance-dependent constant compared to the traditional objective) and a worst-case regret of $Ω (K^{1/3} T^{2/3})$ . We then design an adaptive explore-then-commit policy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Auction Theory and Applications