Fighting Bandits with a New Kind of Smoothness

Jacob Abernethy; Chansoo Lee; Ambuj Tewari

arXiv:1512.04152·cs.LG·December 15, 2015·20 cites

Fighting Bandits with a New Kind of Smoothness

Jacob Abernethy, Chansoo Lee, Ambuj Tewari

PDF

Open Access

TL;DR

This paper introduces a new family of algorithms for adversarial multi-armed bandits using convex smoothing, demonstrating near-optimal regret bounds with various regularization and perturbation methods.

Contribution

It presents a novel analysis technique and shows that Tsallis entropy regularization and certain perturbation methods achieve optimal or near-optimal regret bounds.

Findings

01

Tsallis entropy regularization achieves $ ilde{O}( oot{T}{N})$ regret.

02

Perturbation methods with bounded hazard rate attain $O( oot{T}{N} ext{log}N)$ regret.

03

Various distributions like Gumbel and Weibull satisfy the bounded hazard rate condition.

Abstract

We define a novel family of algorithms for the adversarial multi-armed bandit problem, and provide a simple analysis technique based on convex smoothing. We prove two main results. First, we show that regularization via the \emph{Tsallis entropy}, which includes EXP3 as a special case, achieves the $Θ (T N)$ minimax regret. Second, we show that a wide class of perturbation methods achieve a near-optimal regret as low as $O (T N lo g N)$ if the perturbation distribution has a bounded hazard rate. For example, the Gumbel, Weibull, Frechet, Pareto, and Gamma distributions all satisfy this key property.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Gaussian Processes and Bayesian Inference