Global Bandits

Onur Atan; Cem Tekin; Mihaela van der Schaar

arXiv:1503.08370·cs.LG·March 22, 2018

Global Bandits

Onur Atan, Cem Tekin, Mihaela van der Schaar

PDF

Open Access

TL;DR

This paper introduces 'global bandits', a new class of multi-armed bandit models where rewards are correlated through a single unknown parameter, and proposes algorithms with bounded and sub-linear regret for such models.

Contribution

The paper defines global bandits with correlated rewards, and develops algorithms that achieve bounded and sub-linear regret, improving decision-making in correlated reward settings.

Findings

01

The greedy policy achieves bounded regret depending on the true parameter.

02

A variant attains $ ilde{O}( oot{T}{})$ worst-case regret.

03

Experiments show significant gains in dynamic pricing applications.

Abstract

Multi-armed bandits (MAB) model sequential decision making problems, in which a learner sequentially chooses arms with unknown reward distributions in order to maximize its cumulative reward. Most of the prior work on MAB assumes that the reward distributions of each arm are independent. But in a wide variety of decision problems -- from drug dosage to dynamic pricing -- the expected rewards of different arms are correlated, so that selecting one arm provides information about the expected rewards of other arms as well. We propose and analyze a class of models of such decision problems, which we call {\em global bandits}. In the case in which rewards of all arms are deterministic functions of a single unknown parameter, we construct a greedy policy that achieves {\em bounded regret}, with a bound that depends on the single true parameter of the problem. Hence, this policy selects…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research