Mine and Refine: Optimizing Graded Relevance in E-commerce Search Retrieval
Jiaqi Xi, Raghav Saboo, Luming Chen, Martin Wang, Sudeep Das

TL;DR
This paper introduces a two-stage contrastive training framework for e-commerce search embeddings, improving relevance handling and robustness, leading to better retrieval performance and business outcomes.
Contribution
It presents a novel two-stage 'Mine and Refine' approach combining supervised contrastive learning and multi-class boundary sharpening for relevance-aware embeddings.
Findings
Significant improvements in retrieval relevance.
Statistically significant engagement gains.
Enhanced robustness through data augmentation.
Abstract
We propose a two-stage "Mine and Refine" contrastive training framework for semantic text embeddings to enhance multi-category e-commerce search retrieval. Large scale e-commerce search demands embeddings that generalize to long tail, noisy queries while adhering to scalable supervision compatible with product and policy constraints. A practical challenge is that relevance is often graded: users accept substitutes or complements beyond exact matches, and production systems benefit from clear separation of similarity scores across these relevance strata for stable hybrid blending and thresholding. To obtain scalable policy consistent supervision, we fine-tune a lightweight LLM on human annotations under a three-level relevance guideline and further reduce residual noise via engagement driven auditing. In Stage 1, we train a multilingual Siamese two-tower retriever with a label aware…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Text and Document Classification Technologies · Topic Modeling
