AgentRecBench: Benchmarking LLM Agent-based Personalized Recommender Systems

Yu Shang; Peijie Liu; Yuwei Yan; Zijing Wu; Leheng Sheng; Yuanqing Yu; Chumeng Jiang; An Zhang; Fengli Xu; Yu Wang; Min Zhang; Yong Li

arXiv:2505.19623·cs.IR·May 29, 2025

AgentRecBench: Benchmarking LLM Agent-based Personalized Recommender Systems

Yu Shang, Peijie Liu, Yuwei Yan, Zijing Wu, Leheng Sheng, Yuanqing Yu, Chumeng Jiang, An Zhang, Fengli Xu, Yu Wang, Min Zhang, Yong Li

PDF

TL;DR

This paper introduces AgentRecBench, a comprehensive benchmark environment for evaluating LLM-powered agentic recommender systems, highlighting their advantages over traditional methods through standardized testing and community engagement.

Contribution

It presents a novel benchmark framework, an interactive recommendation simulator, and a comparative study of classical and agentic recommendation methods.

Findings

01

Agentic systems outperform classical recommendation methods.

02

The benchmark environment is validated and publicly available.

03

Guidelines for designing effective agentic recommender systems.

Abstract

The emergence of agentic recommender systems powered by Large Language Models (LLMs) represents a paradigm shift in personalized recommendations, leveraging LLMs' advanced reasoning and role-playing capabilities to enable autonomous, adaptive decision-making. Unlike traditional recommendation approaches, agentic recommender systems can dynamically gather and interpret user-item interactions from complex environments, generating robust recommendation strategies that generalize across diverse scenarios. However, the field currently lacks standardized evaluation protocols to systematically assess these methods. To address this critical gap, we propose: (1) an interactive textual recommendation simulator incorporating rich user and item metadata and three typical evaluation scenarios (classic, evolving-interest, and cold-start recommendation tasks); (2) a unified modular framework for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.