Model-free Reinforcement Learning with Stochastic Reward Stabilization   for Recommender Systems

Tianchi Cai; Shenliao Bao; Jiyan Jiang; Shiji Zhou; Wenpeng Zhang,; Lihong Gu; Jinjie Gu; Guannan Zhang

arXiv:2308.13246·cs.LG·August 28, 2023

Model-free Reinforcement Learning with Stochastic Reward Stabilization for Recommender Systems

Tianchi Cai, Shenliao Bao, Jiyan Jiang, Shiji Zhou, Wenpeng Zhang,, Lihong Gu, Jinjie Gu, Guannan Zhang

PDF

TL;DR

This paper addresses the challenge of stochastic user feedback in model-free reinforcement learning for recommender systems by proposing two stabilization frameworks that improve performance.

Contribution

It introduces two novel, model-agnostic frameworks for stabilizing stochastic rewards in RL-based recommender systems, enhancing their effectiveness.

Findings

01

Proposed frameworks outperform baseline methods in simulations.

02

Frameworks effectively utilize various supervised models.

03

Significant performance improvements in industrial recommender systems.

Abstract

Model-free RL-based recommender systems have recently received increasing research attention due to their capability to handle partial feedback and long-term rewards. However, most existing research has ignored a critical feature in recommender systems: one user's feedback on the same item at different times is random. The stochastic rewards property essentially differs from that in classic RL scenarios with deterministic rewards, which makes RL-based recommender systems much more challenging. In this paper, we first demonstrate in a simulator environment where using direct stochastic feedback results in a significant drop in performance. Then to handle the stochastic feedback more efficiently, we design two stochastic reward stabilization frameworks that replace the direct stochastic feedback with that learned by a supervised model. Both frameworks are model-agnostic, i.e., they can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.