eProduct: A Million-Scale Visual Search Benchmark to Address Product Recognition Challenges
Jiangbo Yuan, An-Ti Chiang, Wen Tang, Antonio Haro

TL;DR
eProduct is a large-scale benchmark dataset with 2.5 million product images designed to advance visual search and fine-grained recognition in e-commerce, addressing challenges of subtle visual differences and dataset creation.
Contribution
The paper introduces eProduct, a comprehensive dataset for product recognition and visual search, including training and evaluation sets to facilitate research in super fine-grained recognition.
Findings
Baseline models show promising performance on eProduct.
eProduct covers diverse product categories and visual variations.
The dataset accelerates research in self-supervised and multimodal learning.
Abstract
Large-scale product recognition is one of the major applications of computer vision and machine learning in the e-commerce domain. Since the number of products is typically much larger than the number of categories of products, image-based product recognition is often cast as a visual search rather than a classification problem. It is also one of the instances of super fine-grained recognition, where there are many products with slight or subtle visual differences. It has always been a challenge to create a benchmark dataset for training and evaluation on various visual search solutions in a real-world setting. This motivated creation of eProduct, a dataset consisting of 2.5 million product images towards accelerating development in the areas of self-supervised learning, weakly-supervised learning, and multimodal learning, for fine-grained recognition. We present eProduct as a training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
