Global Bandits with Holder Continuity
Onur Atan, Cem Tekin, Mihaela van der Schaar

TL;DR
This paper introduces the Global Bandits with Holder Continuity model, where arms are interconnected through a global parameter, enabling faster learning and bounded regret, with applications in various fields.
Contribution
It formalizes the GMAB model, proposes a greedy policy with bounded regret, and analyzes how arm informativeness accelerates learning.
Findings
Greedy policy achieves bounded parameter-dependent regret.
Parameter-free regret is sublinear and improves with informativeness.
Bayesian risk bounds are established, generalizing linear bandit results.
Abstract
Standard Multi-Armed Bandit (MAB) problems assume that the arms are independent. However, in many application scenarios, the information obtained by playing an arm provides information about the remainder of the arms. Hence, in such applications, this informativeness can and should be exploited to enable faster convergence to the optimal solution. In this paper, we introduce and formalize the Global MAB (GMAB), in which arms are globally informative through a global parameter, i.e., choosing an arm reveals information about all the arms. We propose a greedy policy for the GMAB which always selects the arm with the highest estimated expected reward, and prove that it achieves bounded parameter-dependent regret. Hence, this policy selects suboptimal arms only finitely many times, and after a finite number of initial time steps, the optimal arm is selected in all of the remaining time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Auction Theory and Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
