SyNeg: LLM-Driven Synthetic Hard-Negatives for Dense Retrieval
Xiaopeng Li, Xiangyang Li, Hao Zhang, Zhaocheng Du, Pengyue Jia,, Yichao Wang, Xiangyu Zhao, Huifeng Guo, Ruiming Tang

TL;DR
This paper introduces SyNeg, a novel framework that uses large language models to generate high-quality synthetic hard negatives, significantly enhancing dense retrieval performance and training stability.
Contribution
We propose a multi-attribute self-reflection prompting strategy and a hybrid sampling method leveraging LLMs to synthesize effective hard negatives for dense retrieval.
Findings
Improved retrieval accuracy on five benchmark datasets.
Enhanced training stability with synthetic hard negatives.
Demonstrated the effectiveness of LLM-generated negatives in dense retrieval.
Abstract
The performance of Dense retrieval (DR) is significantly influenced by the quality of negative sampling. Traditional DR methods primarily depend on naive negative sampling techniques or on mining hard negatives through external retriever and meticulously crafted strategies. However, naive negative sampling often fails to adequately capture the accurate boundaries between positive and negative samples, whereas existing hard negative sampling methods are prone to false negatives, resulting in performance degradation and training instability. Recent advancements in large language models (LLMs) offer an innovative solution to these challenges by generating contextually rich and diverse negative samples. In this work, we present a framework that harnesses LLMs to synthesize high-quality hard negative samples. We first devise a \textit{multi-attribute self-reflection prompting strategy} to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques
