Product1M: Towards Weakly Supervised Instance-Level Product Retrieval   via Cross-modal Pretraining

Xunlin Zhan; Yangxin Wu; Xiao Dong; Yunchao Wei; Minlong Lu; Yichi; Zhang; Hang Xu; Xiaodan Liang

arXiv:2107.14572·cs.CV·August 10, 2021·1 cites

Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-modal Pretraining

Xunlin Zhan, Yangxin Wu, Xiao Dong, Yunchao Wei, Minlong Lu, Yichi, Zhang, Hang Xu, Xiaodan Liang

PDF

Open Access 1 Repo

TL;DR

This paper introduces Product1M, a large-scale multi-modal dataset for weakly supervised instance-level product retrieval, and proposes CAPTURE, a transformer-based model that effectively leverages multi-modal data for fine-grained product identification.

Contribution

The paper presents a new large-scale dataset, Product1M, and a novel model, CAPTURE, for weakly supervised multi-modal instance-level product retrieval, addressing real-world complexities.

Findings

01

CAPTURE outperforms state-of-the-art baselines.

02

Product1M contains over 1 million image-caption pairs.

03

Extensive ablations confirm model effectiveness.

Abstract

Nowadays, customer's demands for E-commerce are more diversified, which introduces more complications to the product retrieval industry. Previous methods are either subject to single-modal input or perform supervised image-level product retrieval, thus fail to accommodate real-life scenarios where enormous weakly annotated multi-modal data are present. In this paper, we investigate a more realistic setting that aims to perform weakly-supervised multi-modal instance-level product retrieval among fine-grained product categories. To promote the study of this challenging task, we contribute Product1M, one of the largest multi-modal cosmetic datasets for real-world instance-level retrieval. Notably, Product1M contains over 1 million image-caption pairs and consists of two sample types, i.e., single-product and multi-product samples, which encompass a wide variety of cosmetics brands. In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhanxlin/product1m
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques

MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dense Connections · Label Smoothing · Residual Connection · Softmax · Dropout · Adam