The Restless Hidden Markov Bandit with Linear Rewards and Side   Information

Michal Yemini; Amir Leshem; Anelia Somekh-Baruch

arXiv:1910.10271·cs.LG·January 25, 2021

The Restless Hidden Markov Bandit with Linear Rewards and Side Information

Michal Yemini, Amir Leshem, Anelia Somekh-Baruch

PDF

TL;DR

This paper introduces a new hidden Markov bandit model with linear rewards and side information, providing algorithms with logarithmic regret and practical solutions for high-dimensional problems.

Contribution

It proposes a novel model for hidden Markovian bandits with unknown states and structural side information, along with an algorithm achieving low regret.

Findings

01

Logarithmic regret can be achieved with the proposed algorithm.

02

Structural side information reduces regret dependence on action space complexity.

03

The approach is practical for high-dimensional bandit problems.

Abstract

In this paper we present a model for the hidden Markovian bandit problem with linear rewards. As opposed to current work on Markovian bandits, we do not assume that the state is known to the decision maker before making the decision. Furthermore, we assume structural side information where the decision maker knows in advance that there are two types of hidden states; one is common to all arms and evolves according to a Markovian distribution, and the other is unique to each arm and is distributed according to an i.i.d. process that is unique to each arm. We present an algorithm and regret analysis to this problem. Surprisingly, we can recover the hidden states and maintain logarithmic regret in the case of a convex polytope action set. Furthermore, we show that the structural side information leads to expected regret that does not depend on the number of extreme points in the action…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.