EComStage: Stage-wise and Orientation-specific Benchmarking for Large Language Models in E-commerce
Kaiyan Zhao, Zijie Meng, Zheyong Xie, Jin Duan, Yao Hu, Zuozhu Liu, Shaosheng Cao

TL;DR
EComStage introduces a comprehensive, stage-wise benchmark for evaluating large language models in e-commerce, covering perception, planning, and action across customer and merchant scenarios, with insights into model strengths and weaknesses.
Contribution
The paper presents EComStage, a novel benchmark that assesses LLMs across multiple reasoning stages and diverse e-commerce scenarios, including merchant-oriented tasks, which were overlooked by prior benchmarks.
Findings
Evaluated over 30 LLMs revealing stage-specific strengths and weaknesses.
Identified performance gaps in perception, planning, and action stages.
Provided actionable insights for optimizing LLMs in real-world e-commerce applications.
Abstract
Large Language Model (LLM)-based agents are increasingly deployed in e-commerce applications to assist customer services in tasks such as product inquiries, recommendations, and order management. Existing benchmarks primarily evaluate whether these agents successfully complete the final task, overlooking the intermediate reasoning stages that are crucial for effective decision-making. To address this gap, we propose EComStage, a unified benchmark for evaluating agent-capable LLMs across the comprehensive stage-wise reasoning process: Perception (understanding user intent), Planning (formulating an action plan), and Action (executing the decision). EComStage evaluates LLMs through seven separate representative tasks spanning diverse e-commerce scenarios, with all samples human-annotated and quality-checked. Unlike prior benchmarks that focus only on customer-oriented interactions,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · Forecasting Techniques and Applications
