Unlocking Markets: A Multilingual Benchmark to Cross-Market Question Answering
Yifei Yuan, Yang Deng, Anders S{\o}gaard, Mohammad Aliannejadi

TL;DR
This paper introduces MCPQA, a new multilingual, cross-market question answering benchmark with over 7 million questions, demonstrating that leveraging cross-market data improves answer quality and question ranking.
Contribution
The paper presents a large-scale multilingual dataset for cross-market product question answering and benchmarks various models, highlighting the benefits of cross-market information integration.
Findings
Cross-market data improves answer accuracy.
LLMs outperform traditional models in this task.
Multilingual dataset enables broader applicability.
Abstract
Users post numerous product-related questions on e-commerce platforms, affecting their purchase decisions. Product-related question answering (PQA) entails utilizing product-related resources to provide precise responses to users. We propose a novel task of Multilingual Cross-market Product-based Question Answering (MCPQA) and define the task as providing answers to product-related questions in a main marketplace by utilizing information from another resource-rich auxiliary marketplace in a multilingual context. We introduce a large-scale dataset comprising over 7 million questions from 17 marketplaces across 11 languages. We then perform automatic translation on the Electronics category of our dataset, naming it as McMarket. We focus on two subtasks: review-based answer generation and product-related question ranking. For each subtask, we label a subset of McMarket using an LLM and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques
MethodsFocus
