Policy Gradients for Contextual Recommendations

Feiyang Pan; Qingpeng Cai; Pingzhong Tang; Fuzhen Zhuang; Qing He

arXiv:1802.04162·cs.LG·March 5, 2019

Policy Gradients for Contextual Recommendations

Feiyang Pan, Qingpeng Cai, Pingzhong Tang, Fuzhen Zhuang, Qing He

PDF

TL;DR

This paper introduces PGCR, a novel policy gradient method for contextual recommendations that handles complex, dynamic environments and outperforms existing bandit algorithms in convergence speed and regret.

Contribution

PGCR is a new policy gradient approach that relaxes simplifying assumptions of traditional bandit methods, enabling more realistic and effective recommendation strategies.

Findings

01

PGCR achieves faster convergence than traditional methods.

02

PGCR demonstrates lower regret in experiments.

03

PGCR outperforms vanilla policy gradient methods.

Abstract

Decision making is a challenging task in online recommender systems. The decision maker often needs to choose a contextual item at each step from a set of candidates. Contextual bandit algorithms have been successfully deployed to such applications, for the trade-off between exploration and exploitation and the state-of-art performance on minimizing online costs. However, the applicability of existing contextual bandit methods is limited by the over-simplified assumptions of the problem, such as assuming a simple form of the reward function or assuming a static environment where the states are not affected by previous actions. In this work, we put forward Policy Gradients for Contextual Recommendations (PGCR) to solve the problem without those unrealistic assumptions. It optimizes over a restricted class of policies where the marginal probability of choosing an item (in expectation of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsDropout