The Restless Hidden Markov Bandit with Linear Rewards and Side Information
Michal Yemini, Amir Leshem, Anelia Somekh-Baruch

TL;DR
This paper introduces a new hidden Markov bandit model with linear rewards and side information, providing algorithms with logarithmic regret and practical solutions for high-dimensional problems.
Contribution
It proposes a novel model for hidden Markovian bandits with unknown states and structural side information, along with an algorithm achieving low regret.
Findings
Logarithmic regret can be achieved with the proposed algorithm.
Structural side information reduces regret dependence on action space complexity.
The approach is practical for high-dimensional bandit problems.
Abstract
In this paper we present a model for the hidden Markovian bandit problem with linear rewards. As opposed to current work on Markovian bandits, we do not assume that the state is known to the decision maker before making the decision. Furthermore, we assume structural side information where the decision maker knows in advance that there are two types of hidden states; one is common to all arms and evolves according to a Markovian distribution, and the other is unique to each arm and is distributed according to an i.i.d. process that is unique to each arm. We present an algorithm and regret analysis to this problem. Surprisingly, we can recover the hidden states and maintain logarithmic regret in the case of a convex polytope action set. Furthermore, we show that the structural side information leads to expected regret that does not depend on the number of extreme points in the action…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
