ProMap: Datasets for Product Mapping in E-commerce
Kate\v{r}ina Mackov\'a, Martin Pil\'at

TL;DR
ProMap introduces two comprehensive datasets for product mapping in e-commerce, including images and descriptions, addressing limitations of existing datasets and enabling more accurate identification of matching and non-matching product pairs.
Contribution
The paper presents two new detailed datasets for product mapping, with carefully selected similar non-matching pairs, enhancing research and model training in e-commerce product matching.
Findings
Datasets contain images and textual descriptions, including specifications.
Non-matching pairs are carefully selected to be very similar, increasing difficulty.
Baseline models demonstrate the datasets' complexity and usefulness.
Abstract
The goal of product mapping is to decide, whether two listings from two different e-shops describe the same products. Existing datasets of matching and non-matching pairs of products, however, often suffer from incomplete product information or contain only very distant non-matching products. Therefore, while predictive models trained on these datasets achieve good results on them, in practice, they are unusable as they cannot distinguish very similar but non-matching pairs of products. This paper introduces two new datasets for product mapping: ProMapCz consisting of 1,495 Czech product pairs and ProMapEn consisting of 1,555 English product pairs of matching and non-matching products manually scraped from two pairs of e-shops. The datasets contain both images and textual descriptions of the products, including their specifications, making them one of the most complete datasets for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Web Data Mining and Analysis
