Model-free Reinforcement Learning with Stochastic Reward Stabilization for Recommender Systems
Tianchi Cai, Shenliao Bao, Jiyan Jiang, Shiji Zhou, Wenpeng Zhang,, Lihong Gu, Jinjie Gu, Guannan Zhang

TL;DR
This paper addresses the challenge of stochastic user feedback in model-free reinforcement learning for recommender systems by proposing two stabilization frameworks that improve performance.
Contribution
It introduces two novel, model-agnostic frameworks for stabilizing stochastic rewards in RL-based recommender systems, enhancing their effectiveness.
Findings
Proposed frameworks outperform baseline methods in simulations.
Frameworks effectively utilize various supervised models.
Significant performance improvements in industrial recommender systems.
Abstract
Model-free RL-based recommender systems have recently received increasing research attention due to their capability to handle partial feedback and long-term rewards. However, most existing research has ignored a critical feature in recommender systems: one user's feedback on the same item at different times is random. The stochastic rewards property essentially differs from that in classic RL scenarios with deterministic rewards, which makes RL-based recommender systems much more challenging. In this paper, we first demonstrate in a simulator environment where using direct stochastic feedback results in a significant drop in performance. Then to handle the stochastic feedback more efficiently, we design two stochastic reward stabilization frameworks that replace the direct stochastic feedback with that learned by a supervised model. Both frameworks are model-agnostic, i.e., they can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
