Multimodal semantic retrieval for product search

Dong Liu; Esther Lopez Ramos

arXiv:2501.07365·cs.IR·February 18, 2025

Multimodal semantic retrieval for product search

Dong Liu, Esther Lopez Ramos

PDF

1 Repo

TL;DR

This paper explores the use of multimodal representations combining text and images for product semantic retrieval in e-commerce, showing improvements over text-only models in relevance and purchase recall.

Contribution

It introduces a multimodal product representation scheme and evaluates its effectiveness, demonstrating enhanced retrieval performance in e-commerce search.

Findings

01

Multimodal representations improve relevance accuracy.

02

Multimodal models increase purchase recall.

03

Numerical analysis validates multimodal advantages.

Abstract

Semantic retrieval (also known as dense retrieval) based on textual data has been extensively studied for both web search and product search application fields, where the relevance of a query and a potential target document is computed by their dense vector representation comparison. Product image is crucial for e-commerce search interactions and is a key factor for customers at product explorations. However, its impact on semantic retrieval has not been well studied yet. In this research, we build a multimodal representation for product items in e-commerce search in contrast to pure-text representation of products, and investigate the impact of such representations. The models are developed and evaluated on e-commerce datasets. We demonstrate that a multimodal representation scheme for a product can show improvement either on purchase recall or relevance accuracy in semantic retrieval.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mayurbhangale/multimodal-retrieval
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.