Lipschitz Bandits: Regret Lower Bounds and Optimal Algorithms

Stefan Magureanu; Richard Combes; Alexandre Proutiere

arXiv:1405.4758·cs.LG·May 20, 2014·62 cites

Lipschitz Bandits: Regret Lower Bounds and Optimal Algorithms

Stefan Magureanu, Richard Combes, Alexandre Proutiere

PDF

Open Access

TL;DR

This paper establishes regret lower bounds and introduces optimal algorithms for Lipschitz bandit problems, demonstrating their effectiveness in both discrete and continuous settings through theoretical analysis and numerical experiments.

Contribution

It provides the first asymptotic regret lower bounds for Lipschitz bandits and proposes algorithms that are proven to be asymptotically optimal, extending to continuous and contextual cases.

Findings

01

OSLB is asymptotically optimal for discrete Lipschitz bandits.

02

Discretization combined with OSLB or CKL-UCB outperforms existing methods.

03

Algorithms extend effectively to contextual bandits with similarities.

Abstract

We consider stochastic multi-armed bandit problems where the expected reward is a Lipschitz function of the arm, and where the set of arms is either discrete or continuous. For discrete Lipschitz bandits, we derive asymptotic problem specific lower bounds for the regret satisfied by any algorithm, and propose OSLB and CKL-UCB, two algorithms that efficiently exploit the Lipschitz structure of the problem. In fact, we prove that OSLB is asymptotically optimal, as its asymptotic regret matches the lower bound. The regret analysis of our algorithms relies on a new concentration inequality for weighted sums of KL divergences between the empirical distributions of rewards and their true distributions. For continuous Lipschitz bandits, we propose to first discretize the action space, and then apply OSLB or CKL-UCB, algorithms that provably exploit the structure efficiently. This approach is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Decision-Making and Behavioral Economics · Risk and Portfolio Optimization