Embedding based retrieval for long tail search queries in ecommerce

Akshay Kekuda; Yuyang Zhang; Arun Udayashankar

arXiv:2505.01946·cs.IR·May 27, 2025

Embedding based retrieval for long tail search queries in ecommerce

Akshay Kekuda, Yuyang Zhang, Arun Udayashankar

PDF

TL;DR

This paper details a series of model optimizations for long tail ecommerce search queries, including leveraging large language models, pretraining, and query-to-query finetuning, resulting in a 3% conversion boost.

Contribution

The paper introduces specific optimization techniques for embedding-based retrieval models tailored to long tail ecommerce queries, enhancing conversion rates.

Findings

01

3% increase in conversion rate from A/B testing

02

Effective use of large language models to improve signal sparsity

03

Improved evaluation through curated human-in-the-loop dataset

Abstract

In this abstract we present a series of optimizations we performed on the two-tower model architecture [14], and training and evaluation datasets to implement semantic product search at Best Buy. Search queries on bestbuy.com follow the pareto distribution whereby a minority of them account for most searches. This leaves us with a long tail of search queries that have low frequency of issuance. The queries in the long tail suffer from very spare interaction signals. Our current work focuses on building a model to serve the long tail queries. We present a series of optimizations we have done to this model to maximize conversion for the purpose of retrieval from the catalog. The first optimization we present is using a large language model to improve the sparsity of conversion signals. The second optimization is pretraining an off-the-shelf transformer-based model on the Best Buy catalog…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.