Adaptive KL-UCB based Bandit Algorithms for Markovian and i.i.d.   Settings

Arghyadip Roy; Sanjay Shakkottai; R. Srikant

arXiv:2009.06606·cs.LG·October 11, 2022·1 cites

Adaptive KL-UCB based Bandit Algorithms for Markovian and i.i.d. Settings

Arghyadip Roy, Sanjay Shakkottai, R. Srikant

PDF

Open Access

TL;DR

This paper introduces an adaptive bandit algorithm that distinguishes between Markovian and i.i.d. rewards, switching between KL-UCB variants to achieve low regret in both settings.

Contribution

The paper proposes a novel algorithm that detects reward types and adaptively switches KL-UCB variants, improving regret guarantees across Markovian and i.i.d. reward models.

Findings

01

Achieves logarithmic regret in Markovian and i.i.d. settings.

02

Effectively distinguishes reward types using total variation distance.

03

Switches between KL-UCB variants for optimal performance.

Abstract

In the regret-based formulation of Multi-armed Bandit (MAB) problems, except in rare instances, much of the literature focuses on arms with i.i.d. rewards. In this paper, we consider the problem of obtaining regret guarantees for MAB problems in which the rewards of each arm form a Markov chain which may not belong to a single parameter exponential family. To achieve a logarithmic regret in such problems is not difficult: a variation of standard Kullback-Leibler Upper Confidence Bound (KL-UCB) does the job. However, the constants obtained from such an analysis are poor for the following reason: i.i.d. rewards are a special case of Markov rewards and it is difficult to design an algorithm that works well independent of whether the underlying model is truly Markovian or i.i.d. To overcome this issue, we introduce a novel algorithm that identifies whether the rewards from each arm are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Optimization and Search Problems