DARLR: Dual-Agent Offline Reinforcement Learning for Recommender Systems with Dynamic Reward

Yi Zhang; Ruihong Qiu; Xuwei Xu; Jiajun Liu; Sen Wang

arXiv:2505.07257·cs.IR·May 13, 2025

DARLR: Dual-Agent Offline Reinforcement Learning for Recommender Systems with Dynamic Reward

Yi Zhang, Ruihong Qiu, Xuwei Xu, Jiajun Liu, Sen Wang

PDF

Open Access

TL;DR

DARLR introduces a dual-agent offline reinforcement learning framework with dynamic reward shaping and uncertainty adaptation, significantly improving recommendation policies by addressing reward inaccuracy issues in world models.

Contribution

It proposes a novel dual-agent framework with a selector and recommender for dynamic reward estimation and uncertainty management in offline RL for recommender systems.

Findings

01

DARLR outperforms existing methods on four benchmark datasets.

02

Dynamic reward shaping improves policy accuracy.

03

Adaptive uncertainty penalties enhance decision risk mitigation.

Abstract

Model-based offline reinforcement learning (RL) has emerged as a promising approach for recommender systems, enabling effective policy learning by interacting with frozen world models. However, the reward functions in these world models, trained on sparse offline logs, often suffer from inaccuracies. Specifically, existing methods face two major limitations in addressing this challenge: (1) deterministic use of reward functions as static look-up tables, which propagates inaccuracies during policy learning, and (2) static uncertainty designs that fail to effectively capture decision risks and mitigate the impact of these inaccuracies. In this work, a dual-agent framework, DARLR, is proposed to dynamically update world models to enhance recommendation policies. To achieve this, a \textbf{\textit{selector}} is introduced to identify reference users by balancing similarity and diversity so…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Explainable Artificial Intelligence (XAI) · Advanced Bandit Algorithms Research

MethodsALIGN