Retail-786k: a Large-Scale Dataset for Visual Entity Matching
Bianca Lamm (1, 2), Janis Keuper (1) ((1) IMLA, Offenburg, University, (2) Markant Services International GmbH)

TL;DR
This paper introduces Retail-786k, a large-scale dataset for visual entity matching in retail, highlighting its novelty, providing baseline evaluations, and emphasizing the need for new algorithms beyond standard classification methods.
Contribution
The paper presents the first large-scale dataset for visual entity matching in retail, enabling research on transferring visual equivalence classes to new data.
Findings
Standard image classification methods are insufficient for visual entity matching.
Novel transfer-based approaches are needed for this problem.
The dataset facilitates benchmarking of such new algorithms.
Abstract
Entity Matching (EM) defines the task of learning to group objects by transferring semantic concepts from example groups (=entities) to unseen data. Despite the general availability of image data in the context of many EM-problems, most currently available EM-algorithms solely rely on (textual) meta data. In this paper, we introduce the first publicly available large-scale dataset for "visual entity matching", based on a production level use case in the retail domain. Using scanned advertisement leaflets, collected over several years from different European retailers, we provide a total of ~786k manually annotated, high resolution product images containing ~18k different individual retail products which are grouped into ~3k entities. The annotation of these product entities is based on a price comparison task, where each entity forms an equivalence class of comparable products.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Topic Modeling
