Stochastic Bandits with Vector Losses: Minimizing $\ell^\infty$-Norm of   Relative Losses

Xuedong Shang; Han Shao; Jian Qian

arXiv:2010.08061·cs.LG·October 19, 2020

Stochastic Bandits with Vector Losses: Minimizing $\ell^\infty$-Norm of Relative Losses

Xuedong Shang, Han Shao, Jian Qian

PDF

Open Access

TL;DR

This paper studies multi-armed bandit problems with multiple losses, focusing on minimizing the maximum relative loss, and develops theoretical bounds and algorithms for best-arm identification and regret minimization.

Contribution

It introduces a new framework for multi-loss bandits using relative loss vectors and provides matching lower bounds and algorithms for both fixed-confidence and regret minimization objectives.

Findings

01

Derived a problem-dependent sample complexity lower bound for best-arm identification.

02

Established a regret lower bound of a T^{2/3} and proposed a matching algorithm.

03

Analyzed the minimax a a a a a a a a a a a a a a a a a a a a a a .

Abstract

Multi-armed bandits are widely applied in scenarios like recommender systems, for which the goal is to maximize the click rate. However, more factors should be considered, e.g., user stickiness, user growth rate, user experience assessment, etc. In this paper, we model this situation as a problem of $K$ -armed bandit with multiple losses. We define relative loss vector of an arm where the $i$ -th entry compares the arm and the optimal arm with respect to the $i$ -th loss. We study two goals: (a) finding the arm with the minimum $ℓ^{\infty}$ -norm of relative losses with a given confidence level (which refers to fixed-confidence best-arm identification); (b) minimizing the $ℓ^{\infty}$ -norm of cumulative relative losses (which refers to regret minimization). For goal (a), we derive a problem-dependent sample complexity lower bound and discuss how to achieve matching algorithms. For goal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Optimization and Search Problems