Optimal Recommendation to Users that React: Online Learning for a Class   of POMDPs

Rahul Meshram; Aditya Gopalan; D. Manjunath

arXiv:1603.09233·cs.LG·March 31, 2016·1 cites

Optimal Recommendation to Users that React: Online Learning for a Class of POMDPs

Rahul Meshram, Aditya Gopalan, D. Manjunath

PDF

Open Access

TL;DR

This paper models an online recommendation system using POMDPs, accounting for time-dependent user preferences influenced by past recommendations, and develops a learning algorithm with provable guarantees.

Contribution

It introduces a realistic POMDP-based model for recommendation systems and proposes a Thompson sampling algorithm with theoretical performance analysis.

Findings

01

Structural properties of the POMDP for a single content item.

02

Optimal policy characterization for the POMDP model.

03

Regret bounds for the proposed learning algorithm.

Abstract

We describe and study a model for an Automated Online Recommendation System (AORS) in which a user's preferences can be time-dependent and can also depend on the history of past recommendations and play-outs. The three key features of the model that makes it more realistic compared to existing models for recommendation systems are (1) user preference is inherently latent, (2) current recommendations can affect future preferences, and (3) it allows for the development of learning algorithms with provable performance guarantees. The problem is cast as an average-cost restless multi-armed bandit for a given user, with an independent partially observable Markov decision process (POMDP) for each item of content. We analyze the POMDP for a single arm, describe its structural properties, and characterize its optimal policy. We then develop a Thompson sampling-based online reinforcement…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Reinforcement Learning in Robotics