Monte Carlo Rollout Policy for Recommendation Systems with Dynamic User   Behavior

Rahul Meshram; Kesav Kaza

arXiv:2102.04321·eess.SY·February 9, 2021

Monte Carlo Rollout Policy for Recommendation Systems with Dynamic User Behavior

Rahul Meshram, Kesav Kaza

PDF

TL;DR

This paper introduces a Monte Carlo rollout policy for online recommendation systems modeled as a hidden Markov multi-state restless bandit, demonstrating its effectiveness over myopic policies in certain dynamic scenarios.

Contribution

The paper proposes a Monte Carlo rollout policy for complex bandit models in recommendation systems and compares its performance to myopic policies under different transition structures.

Findings

01

Monte Carlo rollout outperforms myopic policy in arbitrary transition dynamics.

02

Myopic policy performs better when transition dynamics have specific structures.

03

Numerical simulations validate the effectiveness of the proposed policy.

Abstract

We model online recommendation systems using the hidden Markov multi-state restless multi-armed bandit problem. To solve this we present Monte Carlo rollout policy. We illustrate numerically that Monte Carlo rollout policy performs better than myopic policy for arbitrary transition dynamics with no specific structure. But, when some structure is imposed on the transition dynamics, myopic policy performs better than Monte Carlo rollout policy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.