Boltzmann Exploration Done Right

Nicol\`o Cesa-Bianchi; Claudio Gentile; G\'abor Lugosi and; Gergely Neu

arXiv:1705.10257·cs.LG·November 8, 2017·26 cites

Boltzmann Exploration Done Right

Nicol\`o Cesa-Bianchi, Claudio Gentile, G\'abor Lugosi and, Gergely Neu

PDF

Open Access

TL;DR

This paper critically examines Boltzmann exploration in multi-armed bandits, revealing its limitations with monotone learning rates and proposing novel non-monotone and arm-specific strategies that achieve near-optimal regret bounds.

Contribution

It introduces a non-monotone schedule and a new arm-specific learning rate variant that improve exploration efficiency and regret bounds without prior problem knowledge.

Findings

01

Monotone Boltzmann exploration induces suboptimal behavior.

02

A non-monotone schedule can achieve near-optimal performance with prior knowledge.

03

Arm-specific learning rates attain optimal regret bounds without prior knowledge.

Abstract

Boltzmann exploration is a classic strategy for sequential decision-making under uncertainty, and is one of the most standard tools in Reinforcement Learning (RL). Despite its widespread use, there is virtually no theoretical understanding about the limitations or the actual benefits of this exploration scheme. Does it drive exploration in a meaningful way? Is it prone to misidentifying the optimal actions or spending too much time exploring the suboptimal ones? What is the right tuning for the learning rate? In this paper, we address several of these questions in the classic setup of stochastic multi-armed bandits. One of our main results is showing that the Boltzmann exploration strategy with any monotone learning-rate sequence will induce suboptimal behavior. As a remedy, we offer a simple non-monotone schedule that guarantees near-optimal performance, albeit only when given prior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management