RecRM-Bench: Benchmarking Multidimensional Reward Modeling for Agentic Recommender Systems

Wenwen Zeng; Jinhui Zhang; Hao Chen; Zhaoyu Hu; Yongqi Liang; Jiajun Chai; Dengcan Liu; Zhenfeng Liu; Shurui Yan; Minglong Xue; Xiaohan Wang; Wei Lin; Guojun Yin

arXiv:2605.11874·cs.IR·May 13, 2026

RecRM-Bench: Benchmarking Multidimensional Reward Modeling for Agentic Recommender Systems

Wenwen Zeng, Jinhui Zhang, Hao Chen, Zhaoyu Hu, Yongqi Liang, Jiajun Chai, Dengcan Liu, Zhenfeng Liu, Shurui Yan, Minglong Xue, Xiaohan Wang, Wei Lin, Guojun Yin

PDF

1 Repo 1 Datasets

TL;DR

RecRM-Bench is a comprehensive benchmark dataset designed to evaluate multi-dimensional reward models for agentic recommender systems, addressing the limitations of single-dimensional reward approaches.

Contribution

It introduces the largest dataset for multi-dimensional reward modeling in recommender systems and proposes a systematic framework for reward model construction.

Findings

01

Over 1 million entries across four evaluation dimensions.

02

Supports assessment from syntax to complex intent grounding.

03

Provides a foundation for training advanced reward models.

Abstract

The integration of Large Language Model (LLM) agents is transforming recommender systems from simple query-item matching towards deeply personalized and interactive recommendations. Reinforcement Learning (RL) provides an essential framework for the optimization of these agents in recommendation tasks. However, current methodologies remain limited by a reliance on single dimensional outcome-based rewards that focus exclusively on final user interactions, overlooking critical intermediate capabilities, such as instruction following and complex intent understanding. Despite the necessity for designing multi-dimensional reward, the field lacks a standardized benchmark to facilitate this development. To bridge this gap, we introduce RecRM-Bench, the largest and most comprehensive benchmark to date for agentic recommender systems. It comprises over 1 million structured entries across four…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://huggingface.co/datasets/wwzeng/RecRM-Bench
github

Datasets

wwzeng/RecRM-Bench
dataset· 192 dl
192 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.