Monte Carlo Rollout Policy for Recommendation Systems with Dynamic User Behavior
Rahul Meshram, Kesav Kaza

TL;DR
This paper introduces a Monte Carlo rollout policy for online recommendation systems modeled as a hidden Markov multi-state restless bandit, demonstrating its effectiveness over myopic policies in certain dynamic scenarios.
Contribution
The paper proposes a Monte Carlo rollout policy for complex bandit models in recommendation systems and compares its performance to myopic policies under different transition structures.
Findings
Monte Carlo rollout outperforms myopic policy in arbitrary transition dynamics.
Myopic policy performs better when transition dynamics have specific structures.
Numerical simulations validate the effectiveness of the proposed policy.
Abstract
We model online recommendation systems using the hidden Markov multi-state restless multi-armed bandit problem. To solve this we present Monte Carlo rollout policy. We illustrate numerically that Monte Carlo rollout policy performs better than myopic policy for arbitrary transition dynamics with no specific structure. But, when some structure is imposed on the transition dynamics, myopic policy performs better than Monte Carlo rollout policy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
