Learning a Unified Embedding for Visual Search at Pinterest
Andrew Zhai, Hao-Yu Wu, Eric Tzeng, Dong Huk Park, Charles Rosenberg

TL;DR
This paper presents a multi-task deep metric learning system that creates a single unified image embedding for Pinterest's visual search, improving relevance, engagement, and operational efficiency across multiple applications.
Contribution
The work introduces a unified embedding approach that combines multiple visual search objectives into one deep neural network, outperforming specialized embeddings and reducing maintenance costs.
Findings
Unified embedding outperforms specialized embeddings in relevance and engagement.
Joint training on diverse data sources enhances embedding quality.
Binarized embeddings maintain high precision and recall for efficient retrieval.
Abstract
At Pinterest, we utilize image embeddings throughout our search and recommendation systems to help our users navigate through visual content by powering experiences like browsing of related content and searching for exact products for shopping. In this work we describe a multi-task deep metric learning system to learn a single unified image embedding which can be used to power our multiple visual search products. The solution we present not only allows us to train for multiple application objectives in a single deep neural network architecture, but takes advantage of correlated information in the combination of all training data from each application to generate a unified embedding that outperforms all specialized embeddings previously deployed for each product. We discuss the challenges of handling images from different domains such as camera photos, high quality web images, and clean…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
