A Note on KL-UCB+ Policy for the Stochastic Bandit

Junya Honda

arXiv:1903.07839·cs.LG·March 21, 2019·1 cites

A Note on KL-UCB+ Policy for the Stochastic Bandit

Junya Honda

PDF

Open Access

TL;DR

This paper proves the asymptotic optimality of the KL-UCB+ policy for stochastic bandits using techniques similar to those for other policies, explaining its empirical performance improvements.

Contribution

It provides a simple proof of the asymptotic optimality of KL-UCB+ policy, clarifying its theoretical performance.

Findings

01

KL-UCB+ empirically outperforms KL-UCB

02

Asymptotic optimality of KL-UCB+ established

03

Proof uses techniques similar to other bandit policies

Abstract

A classic setting of the stochastic K-armed bandit problem is considered in this note. In this problem it has been known that KL-UCB policy achieves the asymptotically optimal regret bound and KL-UCB+ policy empirically performs better than the KL-UCB policy although the regret bound for the original form of the KL-UCB+ policy has been unknown. This note demonstrates that a simple proof of the asymptotic optimality of the KL-UCB+ policy can be given by the same technique as those used for analyses of other known policies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Optimization and Search Problems