A Hidden Markov Restless Multi-armed Bandit Model for Playout   Recommendation Systems

Rahul Meshram; Aditya Gopalan; D. Manjunath

arXiv:1704.02894·cs.SY·April 11, 2017·2 cites

A Hidden Markov Restless Multi-armed Bandit Model for Playout Recommendation Systems

Rahul Meshram, Aditya Gopalan, D. Manjunath

PDF

Open Access

TL;DR

This paper models a specialized restless multi-armed bandit for recommendation systems, deriving Whittle indices and proposing a Thompson sampling-based learning algorithm for parameter estimation.

Contribution

It introduces a new RMAB model with two arm types, derives closed-form Whittle indices, and develops a parameter learning algorithm for practical application.

Findings

01

The RMAB is proven to be Whittle-indexable.

02

Closed-form Whittle index expressions are obtained.

03

A Thompson sampling-based algorithm effectively learns arm parameters.

Abstract

We consider a restless multi-armed bandit (RMAB) in which there are two types of arms, say A and B. Each arm can be in one of two states, say $0$ or $1.$ Playing a type A arm brings it to state $0$ with probability one and not playing it induces state transitions with arm-dependent probabilities. Whereas playing a type B arm leads it to state $1$ with probability $1$ and not playing it gets state that dependent on transition probabilities of arm. Further, play of an arm generates a unit reward with a probability that depends on the state of the arm. The belief about the state of the arm can be calculated using a Bayesian update after every play. This RMAB has been designed for use in recommendation systems where the user's preferences depend on the history of recommendations. This RMAB can also be used in applications like creating of playlists or placement of advertisements. In this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management