ChineseEcomQA: A Scalable E-commerce Concept Evaluation Benchmark for Large Language Models
Haibin Chen, Kangtao Lv, Chengwei Hu, Yanshi Li, Yujin Yuan, Yancheng, He, Xingyao Zhang, Langming Liu, Shilei Liu, Wenbo Su, Bo Zheng

TL;DR
ChineseEcomQA is a scalable, question-answering benchmark designed to evaluate large language models' understanding of fundamental e-commerce concepts, addressing heterogeneity and balancing generality with specificity.
Contribution
The paper introduces ChineseEcomQA, a novel benchmark for e-commerce concept evaluation that combines LLM validation, RAG validation, and manual annotation to handle diverse tasks.
Findings
Mainstream LLMs show varying performance on ChineseEcomQA.
The benchmark effectively differentiates between general and specific e-commerce concepts.
ChineseEcomQA guides future domain-specific LLM evaluations.
Abstract
With the increasing use of Large Language Models (LLMs) in fields such as e-commerce, domain-specific concept evaluation benchmarks are crucial for assessing their domain capabilities. Existing LLMs may generate factually incorrect information within the complex e-commerce applications. Therefore, it is necessary to build an e-commerce concept benchmark. Existing benchmarks encounter two primary challenges: (1) handle the heterogeneous and diverse nature of tasks, (2) distinguish between generality and specificity within the e-commerce field. To address these problems, we propose \textbf{ChineseEcomQA}, a scalable question-answering benchmark focused on fundamental e-commerce concepts. ChineseEcomQA is built on three core characteristics: \textbf{Focus on Fundamental Concept}, \textbf{E-commerce Generality} and \textbf{E-commerce Expertise}. Fundamental concepts are designed to be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Sentiment Analysis and Opinion Mining · Computational and Text Analysis Methods
