Training Dense Retrievers with Multiple Positive Passages
Benben Wang, Minghao Tang, Hengran Zhang, Jiafeng Guo, Keping Bi

TL;DR
This paper systematically studies multi-positive training objectives for dense retrievers, demonstrating that the LSEPair loss offers superior robustness and performance by effectively leveraging dense supervision signals from LLMs and human annotations.
Contribution
It unifies and analyzes multiple multi-positive training objectives within a contrastive learning framework, providing theoretical insights and practical guidelines for improving retriever training with dense supervision.
Findings
LSEPair consistently outperforms other objectives in robustness and accuracy.
JointLH and SumMargLH are sensitive to positive quality, affecting performance.
Random sampling (Rand1LH) is a reliable baseline for training.
Abstract
Modern knowledge-intensive systems, such as retrieval-augmented generation (RAG), rely on effective retrievers to establish the performance ceiling for downstream modules. However, retriever training has been bottlenecked by sparse, single-positive annotations, which lead to false-negative noise and suboptimal supervision. While the advent of large language models (LLMs) makes it feasible to collect comprehensive multi-positive relevance labels at scale, the optimal strategy for incorporating these dense signals into training remains poorly understood. In this paper, we present a systematic study of multi-positive optimization objectives for retriever training. We unify representative objectives, including Joint Likelihood (JointLH), Summed Marginal Likelihood (SumMargLH), and Log-Sum-Exp Pairwise (LSEPair) loss, under a shared contrastive learning framework. Our theoretical analysis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Topic Modeling · Text and Document Classification Technologies
