ROLeR: Effective Reward Shaping in Offline Reinforcement Learning for Recommender Systems
Yi Zhang, Ruihong Qiu, Jiajun Liu, Sen Wang

TL;DR
ROLeR introduces a novel reward shaping method and improved uncertainty estimation for offline reinforcement learning in recommender systems, leading to state-of-the-art performance on benchmark datasets.
Contribution
The paper proposes ROLeR, a new approach that enhances reward modeling and uncertainty estimation in model-based offline RL for recommender systems.
Findings
ROLeR outperforms existing baselines on four benchmark datasets.
The non-parametric reward shaping improves reward model accuracy.
Enhanced uncertainty penalties lead to better recommendation performance.
Abstract
Offline reinforcement learning (RL) is an effective tool for real-world recommender systems with its capacity to model the dynamic interest of users and its interactive nature. Most existing offline RL recommender systems focus on model-based RL through learning a world model from offline data and building the recommendation policy by interacting with this model. Although these methods have made progress in the recommendation performance, the effectiveness of model-based offline RL methods is often constrained by the accuracy of the estimation of the reward model and the model uncertainties, primarily due to the extreme discrepancy between offline logged data and real-world data in user interactions with online platforms. To fill this gap, a more accurate reward model and uncertainty estimation are needed for the model-based RL methods. In this paper, a novel model-based Reward Shaping…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsFocus
