Almost Boltzmann Exploration

Harsh Gupta; Seo Taek Kong; R. Srikant; Weina Wang

arXiv:1901.08708·cs.LG·April 23, 2019·1 cites

Almost Boltzmann Exploration

Harsh Gupta, Seo Taek Kong, R. Srikant, Weina Wang

PDF

Open Access

TL;DR

This paper introduces a modified Boltzmann exploration algorithm that achieves improved regret bounds in stochastic multi-armed bandit problems, including those with graph-structured feedback, and demonstrates strong empirical performance.

Contribution

A simple modification to Boltzmann exploration based on a variation of the doubling trick achieves better regret bounds and handles graph-structured feedback without prior knowledge.

Findings

01

Achieves $O(K ext{log}^{1+eta} T)$ regret in stochastic MABs.

02

Performs as well or better than state-of-the-art in experiments.

03

Effective in both traditional and graph-structured feedback settings.

Abstract

Boltzmann exploration is widely used in reinforcement learning to provide a trade-off between exploration and exploitation. Recently, in (Cesa-Bianchi et al., 2017) it has been shown that pure Boltzmann exploration does not perform well from a regret perspective, even in the simplest setting of stochastic multi-armed bandit (MAB) problems. In this paper, we show that a simple modification to Boltzmann exploration, motivated by a variation of the standard doubling trick, achieves $O (K lo g^{1 + α} T)$ regret for a stochastic MAB problem with $K$ arms, where $α > 0$ is a parameter of the algorithm. This improves on the result in (Cesa-Bianchi et al., 2017), where an algorithm inspired by the Gumbel-softmax trick achieves $O (K lo g^{2} T)$ regret. We also show that our algorithm achieves $O (β (G) lo g^{1 + α} T)$ regret in stochastic MAB problems with graph-structured feedback,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Optimization and Search Problems