Policy Gradients for Contextual Recommendations
Feiyang Pan, Qingpeng Cai, Pingzhong Tang, Fuzhen Zhuang, Qing He

TL;DR
This paper introduces PGCR, a novel policy gradient method for contextual recommendations that handles complex, dynamic environments and outperforms existing bandit algorithms in convergence speed and regret.
Contribution
PGCR is a new policy gradient approach that relaxes simplifying assumptions of traditional bandit methods, enabling more realistic and effective recommendation strategies.
Findings
PGCR achieves faster convergence than traditional methods.
PGCR demonstrates lower regret in experiments.
PGCR outperforms vanilla policy gradient methods.
Abstract
Decision making is a challenging task in online recommender systems. The decision maker often needs to choose a contextual item at each step from a set of candidates. Contextual bandit algorithms have been successfully deployed to such applications, for the trade-off between exploration and exploitation and the state-of-art performance on minimizing online costs. However, the applicability of existing contextual bandit methods is limited by the over-simplified assumptions of the problem, such as assuming a simple form of the reward function or assuming a static environment where the states are not affected by previous actions. In this work, we put forward Policy Gradients for Contextual Recommendations (PGCR) to solve the problem without those unrealistic assumptions. It optimizes over a restricted class of policies where the marginal probability of choosing an item (in expectation of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDropout
