X-Armed Bandits

S\'ebastien Bubeck (INRIA Futurs); R\'emi Munos (INRIA Lille - Nord; Europe); Gilles Stoltz (DMA; GREGH; INRIA Paris - Rocquencourt); Csaba; Szepesvari

arXiv:1001.4475·cs.LG·April 15, 2011

X-Armed Bandits

S\'ebastien Bubeck (INRIA Futurs), R\'emi Munos (INRIA Lille - Nord, Europe), Gilles Stoltz (DMA, GREGH, INRIA Paris - Rocquencourt), Csaba, Szepesvari

PDF

Open Access

TL;DR

This paper introduces HOO, a hierarchical optimistic optimization algorithm for generalized stochastic bandits with a measurable arm space and locally Lipschitz mean-payoff functions, achieving near-optimal regret bounds.

Contribution

The paper develops HOO, an algorithm with improved regret bounds for complex arm spaces, including dimension-independent rates in certain smoothness conditions.

Findings

01

HOO achieves near $ ilde{O}( oot{2}{n})$ regret in high-dimensional spaces.

02

The algorithm is minimax optimal when the dissimilarity is a metric.

03

Modified versions run in linearithmic time with similar regret guarantees.

Abstract

We consider a generalization of stochastic bandits where the set of arms, $\cX$ , is allowed to be a generic measurable space and the mean-payoff function is "locally Lipschitz" with respect to a dissimilarity function that is known to the decision maker. Under this condition we construct an arm selection policy, called HOO (hierarchical optimistic optimization), with improved regret bounds compared to previous results for a large class of problems. In particular, our results imply that if $\cX$ is the unit hypercube in a Euclidean space and the mean-payoff function has a finite number of global maxima around which the behavior of the function is locally continuous with a known smoothness degree, then the expected regret of HOO is bounded up to a logarithmic factor by $n$ , i.e., the rate of growth of the regret is independent of the dimension of the space. We also prove the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Risk and Portfolio Optimization · Auction Theory and Applications