Global Bandits with Holder Continuity

Onur Atan; Cem Tekin; Mihaela van der Schaar

arXiv:1410.7890·cs.LG·October 30, 2014·2 cites

Global Bandits with Holder Continuity

Onur Atan, Cem Tekin, Mihaela van der Schaar

PDF

Open Access

TL;DR

This paper introduces the Global Bandits with Holder Continuity model, where arms are interconnected through a global parameter, enabling faster learning and bounded regret, with applications in various fields.

Contribution

It formalizes the GMAB model, proposes a greedy policy with bounded regret, and analyzes how arm informativeness accelerates learning.

Findings

01

Greedy policy achieves bounded parameter-dependent regret.

02

Parameter-free regret is sublinear and improves with informativeness.

03

Bayesian risk bounds are established, generalizing linear bandit results.

Abstract

Standard Multi-Armed Bandit (MAB) problems assume that the arms are independent. However, in many application scenarios, the information obtained by playing an arm provides information about the remainder of the arms. Hence, in such applications, this informativeness can and should be exploited to enable faster convergence to the optimal solution. In this paper, we introduce and formalize the Global MAB (GMAB), in which arms are globally informative through a global parameter, i.e., choosing an arm reveals information about all the arms. We propose a greedy policy for the GMAB which always selects the arm with the highest estimated expected reward, and prove that it achieves bounded parameter-dependent regret. Hence, this policy selects suboptimal arms only finitely many times, and after a finite number of initial time steps, the optimal arm is selected in all of the remaining time…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Auction Theory and Applications

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings