Reward Balancing Revisited: Enhancing Offline Reinforcement Learning for Recommender Systems

Wenzheng Shu; Yanxiang Zeng; Yongxiang Tang; Teng Sha; Ning Luo; Yanhua Cheng; Xialong Liu; Fan Zhou; Peng Jiang

arXiv:2506.22112·cs.IR·July 1, 2025

Reward Balancing Revisited: Enhancing Offline Reinforcement Learning for Recommender Systems

Wenzheng Shu, Yanxiang Zeng, Yongxiang Tang, Teng Sha, Ning Luo, Yanhua Cheng, Xialong Liu, Fan Zhou, Peng Jiang

PDF

TL;DR

This paper introduces R3S, an offline reinforcement learning framework that balances reward predictions and policy diversity, improving recommender system performance by integrating model uncertainty and penalization strategies.

Contribution

The paper proposes R3S, a novel offline RL approach that addresses reward balancing and diversity in recommender systems through uncertainty integration and penalization.

Findings

01

R3S enhances world model accuracy.

02

R3S improves recommendation diversity.

03

R3S effectively balances user preferences.

Abstract

Offline reinforcement learning (RL) has emerged as a prevalent and effective methodology for real-world recommender systems, enabling learning policies from historical data and capturing user preferences. In offline RL, reward shaping encounters significant challenges, with past efforts to incorporate prior strategies for uncertainty to improve world models or penalize underexplored state-action pairs. Despite these efforts, a critical gap remains: the simultaneous balancing of intrinsic biases in world models and the diversity of policy recommendations. To address this limitation, we present an innovative offline RL framework termed Reallocated Reward for Recommender Systems (R3S). By integrating inherent model uncertainty to tackle the intrinsic fluctuations in reward predictions, we boost diversity for decision-making to align with a more interactive paradigm, incorporating extra…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.