RecRM-Bench: Benchmarking Multidimensional Reward Modeling for Agentic Recommender Systems
Wenwen Zeng, Jinhui Zhang, Hao Chen, Zhaoyu Hu, Yongqi Liang, Jiajun Chai, Dengcan Liu, Zhenfeng Liu, Shurui Yan, Minglong Xue, Xiaohan Wang, Wei Lin, Guojun Yin

TL;DR
RecRM-Bench is a comprehensive benchmark dataset designed to evaluate multi-dimensional reward models for agentic recommender systems, addressing the limitations of single-dimensional reward approaches.
Contribution
It introduces the largest dataset for multi-dimensional reward modeling in recommender systems and proposes a systematic framework for reward model construction.
Findings
Over 1 million entries across four evaluation dimensions.
Supports assessment from syntax to complex intent grounding.
Provides a foundation for training advanced reward models.
Abstract
The integration of Large Language Model (LLM) agents is transforming recommender systems from simple query-item matching towards deeply personalized and interactive recommendations. Reinforcement Learning (RL) provides an essential framework for the optimization of these agents in recommendation tasks. However, current methodologies remain limited by a reliance on single dimensional outcome-based rewards that focus exclusively on final user interactions, overlooking critical intermediate capabilities, such as instruction following and complex intent understanding. Despite the necessity for designing multi-dimensional reward, the field lacks a standardized benchmark to facilitate this development. To bridge this gap, we introduce RecRM-Bench, the largest and most comprehensive benchmark to date for agentic recommender systems. It comprises over 1 million structured entries across four…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
