Neural Passage Retrieval with Improved Negative Contrast
Jing Lu, Gustavo Hernandez Abrego, Ji Ma, Jianmo Ni, Yinfei Yang

TL;DR
This paper investigates various negative sampling strategies in neural passage retrieval models, demonstrating that combining retrieval-based and heuristic negatives enhances contrastive learning and achieves state-of-the-art results in open-domain question answering.
Contribution
It introduces and evaluates four negative sampling strategies, showing that mixed strategies improve retrieval performance and establish new benchmarks.
Findings
Mixed negative sampling strategies outperform single strategies.
The approach achieves state-of-the-art results on two datasets.
Negative sampling enhances contrast and retrieval accuracy.
Abstract
In this paper we explore the effects of negative sampling in dual encoder models used to retrieve passages for automatic question answering. We explore four negative sampling strategies that complement the straightforward random sampling of negatives, typically used to train dual encoder models. Out of the four strategies, three are based on retrieval and one on heuristics. Our retrieval-based strategies are based on the semantic similarity and the lexical overlap between questions and passages. We train the dual encoder models in two stages: pre-training with synthetic data and fine tuning with domain-specific data. We apply negative sampling to both stages. The approach is evaluated in two passage retrieval tasks. Even though it is not evident that there is one single sampling strategy that works best in all the tasks, it is clear that our strategies contribute to improving the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
