ShopSimulator: Evaluating and Exploring RL-Driven LLM Agent for Shopping Assistants
Pei Wang, Yanan Wu, Xiaoshuai Song, Weixun Wang, Gengru Chen, Zhongwen Li, Kezhong Yan, Ken Deng, Qi Liu, Shuaibing Zhao, Shaopan Xiong, Xuepeng Liu, Xuefeng Chen, Wanxi Deng, Wenbo Su, Bo Zheng

TL;DR
ShopSimulator is a comprehensive Chinese shopping environment for evaluating and training LLM-based shopping agents, revealing current limitations and guiding improvements through combined supervised fine-tuning and reinforcement learning.
Contribution
Introduces ShopSimulator, a unified simulation platform for training and evaluating LLM shopping agents, and demonstrates how combined SFT and RL enhance agent performance.
Findings
LLMs achieve less than 40% success rate in shopping tasks
Agents struggle with deep search and product selection
Combined SFT and RL significantly improve performance
Abstract
Large language model (LLM)-based agents are increasingly deployed in e-commerce shopping. To perform thorough, user-tailored product searches, agents should interpret personal preferences, engage in multi-turn dialogues, and ultimately retrieve and discriminate among highly similar products. However, existing research has yet to provide a unified simulation environment that consistently captures all of these aspects, and always focuses solely on evaluation benchmarks without training support. In this paper, we introduce ShopSimulator, a large-scale and challenging Chinese shopping environment. Leveraging ShopSimulator, we evaluate LLMs across diverse scenarios, finding that even the best-performing models achieve less than 40% full-success rate. Error analysis reveals that agents struggle with deep search and product selection in long trajectories, fail to balance the use of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in Service Interactions · Multimodal Machine Learning Applications · Recommender Systems and Techniques
