Shopping MMLU: A Massive Multi-Task Online Shopping Benchmark for Large   Language Models

Yilun Jin; Zheng Li; Chenwei Zhang; Tianyu Cao; Yifan Gao; Pratik; Jayarao; Mao Li; Xin Liu; Ritesh Sarkhel; Xianfeng Tang; Haodong Wang,; Zhengyang Wang; Wenju Xu; Jingfeng Yang; Qingyu Yin; Xian Li; Priyanka Nigam,; Yi Xu; Kai Chen; Qiang Yang; Meng Jiang; Bing Yin

arXiv:2410.20745·cs.LG·November 1, 2024·2 cites

Shopping MMLU: A Massive Multi-Task Online Shopping Benchmark for Large Language Models

Yilun Jin, Zheng Li, Chenwei Zhang, Tianyu Cao, Yifan Gao, Pratik, Jayarao, Mao Li, Xin Liu, Ritesh Sarkhel, Xianfeng Tang, Haodong Wang,, Zhengyang Wang, Wenju Xu, Jingfeng Yang, Qingyu Yin, Xian Li, Priyanka Nigam,, Yi Xu, Kai Chen, Qiang Yang, Meng Jiang, Bing Yin

PDF

Open Access 1 Repo 1 Video

TL;DR

Shopping MMLU is a comprehensive benchmark derived from real-world Amazon data, designed to evaluate large language models' multi-task online shopping capabilities across diverse skills and languages.

Contribution

It introduces a new multi-task benchmark with 57 tasks for evaluating LLMs in online shopping, covering diverse skills and real-world data, and hosts a related competition.

Findings

01

Benchmarking reveals strengths and weaknesses of existing LLMs in online shopping tasks.

02

The benchmark facilitates the development of more versatile and effective shop assistant models.

03

Insights from the competition guide future research in LLM-based e-commerce applications.

Abstract

Online shopping is a complex multi-task, few-shot learning problem with a wide and evolving range of entities, relations, and tasks. However, existing models and benchmarks are commonly tailored to specific tasks, falling short of capturing the full complexity of online shopping. Large Language Models (LLMs), with their multi-task and few-shot learning abilities, have the potential to profoundly transform online shopping by alleviating task-specific engineering efforts and by providing users with interactive conversations. Despite the potential, LLMs face unique challenges in online shopping, such as domain-specific concepts, implicit knowledge, and heterogeneous user behaviors. Motivated by the potential and challenges, we propose Shopping MMLU, a diverse multi-task online shopping benchmark derived from real-world Amazon data. Shopping MMLU consists of 57 tasks covering 4 major…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kl4805/shoppingmmlu
pytorchOfficial

Videos

Shopping MMLU: A Massive Multi-Task Online Shopping Benchmark for Large Language Models· slideslive

Taxonomy

TopicsRecommender Systems and Techniques