MAKE: Vision-Language Pre-training based Product Retrieval in Taobao Search
Xiaoyang Zheng, Zilong Wang, Ke Xu, Sen Li, Tao Zhuang, Qingwen Liu,, Xiaoyi Zeng

TL;DR
This paper introduces MAKE, a vision-language pre-training model designed for product retrieval in Taobao Search, effectively combining visual and textual information to improve retrieval accuracy and user experience.
Contribution
The paper proposes a novel cross-modal fusion module and a keyword enhancement mechanism, tailored for e-commerce retrieval, advancing vision-language pre-training in this domain.
Findings
MAKE outperforms existing V+L pre-training methods in retrieval tasks.
Deployment of MAKE significantly improves Taobao Search retrieval performance.
Extensive experiments validate the effectiveness of the proposed methods.
Abstract
Taobao Search consists of two phases: the retrieval phase and the ranking phase. Given a user query, the retrieval phase returns a subset of candidate products for the following ranking phase. Recently, the paradigm of pre-training and fine-tuning has shown its potential in incorporating visual clues into retrieval tasks. In this paper, we focus on solving the problem of text-to-multimodal retrieval in Taobao Search. We consider that users' attention on titles or images varies on products. Hence, we propose a novel Modal Adaptation module for cross-modal fusion, which helps assigns appropriate weights on texts and images across products. Furthermore, in e-commerce search, user queries tend to be brief and thus lead to significant semantic imbalance between user queries and product titles. Therefore, we design a separate text encoder and a Keyword Enhancement mechanism to enrich the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Image Retrieval and Classification Techniques
