Embedding based retrieval for long tail search queries in ecommerce
Akshay Kekuda, Yuyang Zhang, Arun Udayashankar

TL;DR
This paper details a series of model optimizations for long tail ecommerce search queries, including leveraging large language models, pretraining, and query-to-query finetuning, resulting in a 3% conversion boost.
Contribution
The paper introduces specific optimization techniques for embedding-based retrieval models tailored to long tail ecommerce queries, enhancing conversion rates.
Findings
3% increase in conversion rate from A/B testing
Effective use of large language models to improve signal sparsity
Improved evaluation through curated human-in-the-loop dataset
Abstract
In this abstract we present a series of optimizations we performed on the two-tower model architecture [14], and training and evaluation datasets to implement semantic product search at Best Buy. Search queries on bestbuy.com follow the pareto distribution whereby a minority of them account for most searches. This leaves us with a long tail of search queries that have low frequency of issuance. The queries in the long tail suffer from very spare interaction signals. Our current work focuses on building a model to serve the long tail queries. We present a series of optimizations we have done to this model to maximize conversion for the purpose of retrieval from the catalog. The first optimization we present is using a large language model to improve the sparsity of conversion signals. The second optimization is pretraining an off-the-shelf transformer-based model on the Best Buy catalog…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
