EcomBench: Towards Holistic Evaluation of Foundation Agents in E-commerce
Rui Min, Zile Qiao, Ze Xu, Jiawen Zhai, Wenyu Gao, Xuanzhong Chen, Haozhen Sun, Zhen Zhang, Xinyu Wang, Hong Zhou, Wenbiao Yin, Bo Zhang, Xuan Zhou, Ming Yan, Yong Jiang, Haicheng Liu, Liang Ding, Ling Zou, Yi R. Fung, Yalong Li, Pengjun Xie

TL;DR
EcomBench is a comprehensive benchmark designed to evaluate foundation agents in realistic e-commerce environments, focusing on practical tasks like information retrieval, reasoning, and knowledge integration based on real user data.
Contribution
The paper introduces EcomBench, a novel, real-world e-commerce benchmark with curated data and multiple task categories to assess agent capabilities in practical scenarios.
Findings
EcomBench covers diverse e-commerce tasks with three difficulty levels.
It is built from genuine user interactions in global e-commerce platforms.
EcomBench enables rigorous evaluation of agents' real-world performance.
Abstract
Foundation agents have rapidly advanced in their ability to reason and interact with real environments, making the evaluation of their core capabilities increasingly important. While many benchmarks have been developed to assess agent performance, most concentrate on academic settings or artificially designed scenarios while overlooking the challenges that arise in real applications. To address this issue, we focus on a highly practical real-world setting, the e-commerce domain, which involves a large volume of diverse user interactions, dynamic market conditions, and tasks directly tied to real decision-making processes. To this end, we introduce EcomBench, a holistic E-commerce Benchmark designed to evaluate agent performance in realistic e-commerce environments. EcomBench is built from genuine user demands embedded in leading global e-commerce ecosystems and is carefully curated and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Agent Systems and Negotiation · AI-based Problem Solving and Planning · Recommender Systems and Techniques
