MAKE: Vision-Language Pre-training based Product Retrieval in Taobao   Search

Xiaoyang Zheng; Zilong Wang; Ke Xu; Sen Li; Tao Zhuang; Qingwen Liu,; Xiaoyi Zeng

arXiv:2301.12646·cs.IR·February 21, 2023

MAKE: Vision-Language Pre-training based Product Retrieval in Taobao Search

Xiaoyang Zheng, Zilong Wang, Ke Xu, Sen Li, Tao Zhuang, Qingwen Liu,, Xiaoyi Zeng

PDF

Open Access

TL;DR

This paper introduces MAKE, a vision-language pre-training model designed for product retrieval in Taobao Search, effectively combining visual and textual information to improve retrieval accuracy and user experience.

Contribution

The paper proposes a novel cross-modal fusion module and a keyword enhancement mechanism, tailored for e-commerce retrieval, advancing vision-language pre-training in this domain.

Findings

01

MAKE outperforms existing V+L pre-training methods in retrieval tasks.

02

Deployment of MAKE significantly improves Taobao Search retrieval performance.

03

Extensive experiments validate the effectiveness of the proposed methods.

Abstract

Taobao Search consists of two phases: the retrieval phase and the ranking phase. Given a user query, the retrieval phase returns a subset of candidate products for the following ranking phase. Recently, the paradigm of pre-training and fine-tuning has shown its potential in incorporating visual clues into retrieval tasks. In this paper, we focus on solving the problem of text-to-multimodal retrieval in Taobao Search. We consider that users' attention on titles or images varies on products. Hence, we propose a novel Modal Adaptation module for cross-modal fusion, which helps assigns appropriate weights on texts and images across products. Furthermore, in e-commerce search, user queries tend to be brief and thus lead to significant semantic imbalance between user queries and product titles. Therefore, we design a separate text encoder and a Keyword Enhancement mechanism to enrich the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Image Retrieval and Classification Techniques