ShopSimulator: Evaluating and Exploring RL-Driven LLM Agent for Shopping Assistants

Pei Wang; Yanan Wu; Xiaoshuai Song; Weixun Wang; Gengru Chen; Zhongwen Li; Kezhong Yan; Ken Deng; Qi Liu; Shuaibing Zhao; Shaopan Xiong; Xuepeng Liu; Xuefeng Chen; Wanxi Deng; Wenbo Su; Bo Zheng

arXiv:2601.18225·cs.AI·January 27, 2026

ShopSimulator: Evaluating and Exploring RL-Driven LLM Agent for Shopping Assistants

Pei Wang, Yanan Wu, Xiaoshuai Song, Weixun Wang, Gengru Chen, Zhongwen Li, Kezhong Yan, Ken Deng, Qi Liu, Shuaibing Zhao, Shaopan Xiong, Xuepeng Liu, Xuefeng Chen, Wanxi Deng, Wenbo Su, Bo Zheng

PDF

Open Access

TL;DR

ShopSimulator is a comprehensive Chinese shopping environment for evaluating and training LLM-based shopping agents, revealing current limitations and guiding improvements through combined supervised fine-tuning and reinforcement learning.

Contribution

Introduces ShopSimulator, a unified simulation platform for training and evaluating LLM shopping agents, and demonstrates how combined SFT and RL enhance agent performance.

Findings

01

LLMs achieve less than 40% success rate in shopping tasks

02

Agents struggle with deep search and product selection

03

Combined SFT and RL significantly improve performance

Abstract

Large language model (LLM)-based agents are increasingly deployed in e-commerce shopping. To perform thorough, user-tailored product searches, agents should interpret personal preferences, engage in multi-turn dialogues, and ultimately retrieve and discriminate among highly similar products. However, existing research has yet to provide a unified simulation environment that consistently captures all of these aspects, and always focuses solely on evaluation benchmarks without training support. In this paper, we introduce ShopSimulator, a large-scale and challenging Chinese shopping environment. Leveraging ShopSimulator, we evaluate LLMs across diverse scenarios, finding that even the best-performing models achieve less than 40% full-success rate. Error analysis reveals that agents struggle with deep search and product selection in long trajectories, fail to balance the use of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in Service Interactions · Multimodal Machine Learning Applications · Recommender Systems and Techniques