Training Dense Retrievers with Multiple Positive Passages

Benben Wang; Minghao Tang; Hengran Zhang; Jiafeng Guo; Keping Bi

arXiv:2602.12727·cs.IR·February 16, 2026

Training Dense Retrievers with Multiple Positive Passages

Benben Wang, Minghao Tang, Hengran Zhang, Jiafeng Guo, Keping Bi

PDF

Open Access

TL;DR

This paper systematically studies multi-positive training objectives for dense retrievers, demonstrating that the LSEPair loss offers superior robustness and performance by effectively leveraging dense supervision signals from LLMs and human annotations.

Contribution

It unifies and analyzes multiple multi-positive training objectives within a contrastive learning framework, providing theoretical insights and practical guidelines for improving retriever training with dense supervision.

Findings

01

LSEPair consistently outperforms other objectives in robustness and accuracy.

02

JointLH and SumMargLH are sensitive to positive quality, affecting performance.

03

Random sampling (Rand1LH) is a reliable baseline for training.

Abstract

Modern knowledge-intensive systems, such as retrieval-augmented generation (RAG), rely on effective retrievers to establish the performance ceiling for downstream modules. However, retriever training has been bottlenecked by sparse, single-positive annotations, which lead to false-negative noise and suboptimal supervision. While the advent of large language models (LLMs) makes it feasible to collect comprehensive multi-positive relevance labels at scale, the optimal strategy for incorporating these dense signals into training remains poorly understood. In this paper, we present a systematic study of multi-positive optimization objectives for retriever training. We unify representative objectives, including Joint Likelihood (JointLH), Summed Marginal Likelihood (SumMargLH), and Log-Sum-Exp Pairwise (LSEPair) loss, under a shared contrastive learning framework. Our theoretical analysis…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation Retrieval and Search Behavior · Topic Modeling · Text and Document Classification Technologies