OxyEcomBench: Benchmarking Multimodal Foundation Models across E-Commerce Ecosystems
Yong Liu, Ximan Liu, Guoqing Yang, Bing Bai, Xiaoqiang Xu, Zhen Chen, Ke Zhang, Yan Li

TL;DR
OxyEcomBench is a comprehensive multimodal benchmark designed to evaluate and quantify the performance gap of large language and multimodal models in the complex, real-world e-commerce domain, covering diverse stakeholders and tasks.
Contribution
The paper introduces OxyEcomBench, a holistic, multi-stakeholder benchmark with 6,300 instances across 29 tasks, supporting various input modalities and difficulty levels, sourced from authentic e-commerce data.
Findings
Leading models show modest performance on the benchmark.
Performance gaps are reduced on OxyEcomBench, highlighting domain-specific knowledge gaps.
The benchmark emphasizes visually salient multimodal cases with key evidence in images.
Abstract
LLMs and MLLMs have become indispensable tools across a wide range of applications. E-commerce, however, poses distinctive challenges -- including intricate domain knowledge, long-tail product evidence, heterogeneous visual data, and the interplay among multiple stakeholder roles -- that diverge substantially from the general world knowledge these models are primarily trained on, often causing a notable gap between their open-domain and e-commerce performance. To systematically quantify this gap, we introduce OxyEcomBench, a unified multimodal benchmark comprising approximately 6,300 high-quality instances for real-world bilingual Chinese--English e-commerce. Although several e-commerce benchmarks have been proposed, they typically adopt a single stakeholder perspective, target a narrow set of tasks, or address isolated challenges, making it difficult to holistically assess models'…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
