ESANS: Effective and Semantic-Aware Negative Sampling for Large-Scale Retrieval Systems
Haibo Xing, Kanefumi Matsuyama, Hao Deng, Jinxin Hu, Yu Zhang, Xiaoyi, Zeng

TL;DR
ESANS introduces a novel negative sampling method for large-scale retrieval systems, combining virtual sample generation and multimodal semantic clustering to improve diversity, reduce false negatives, and enhance retrieval performance.
Contribution
The paper presents ESANS, a new negative sampling approach that integrates EDIS and MSAC to address false negatives and semantic deficiencies in embedding-based retrieval.
Findings
ESANS improves retrieval accuracy in large-scale systems.
ESANS reduces false negatives through semantic-aware clustering.
Experiments show ESANS enhances efficiency and effectiveness.
Abstract
Industrial recommendation systems typically involve a two-stage process: retrieval and ranking, which aims to match users with millions of items. In the retrieval stage, classic embedding-based retrieval (EBR) methods depend on effective negative sampling techniques to enhance both performance and efficiency. However, existing techniques often suffer from false negatives, high cost for ensuring sampling quality and semantic information deficiency. To address these limitations, we propose Effective and Semantic-Aware Negative Sampling (ESANS), which integrates two key components: Effective Dense Interpolation Strategy (EDIS) and Multimodal Semantic-Aware Clustering (MSAC). EDIS generates virtual samples within the low-dimensional embedding space to improve the diversity and density of the sampling distribution while minimizing computational costs. MSAC refines the negative sampling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
