Remining Hard Negatives for Generative Pseudo Labeled Domain Adaptation
Goksenin Yuksel, David Rau, Jaap Kamps

TL;DR
This paper improves domain adaptation for neural information retrieval by analyzing and refreshing hard negatives during training, significantly boosting performance across multiple datasets.
Contribution
It introduces a novel hard-negative re-mining method during GPL training, enhancing the robustness of dense retrievers in cross-domain settings.
Findings
Boosts ranking performance in 13/14 BEIR datasets
Improves results in 9/12 LoTTe datasets
Analyzes the impact of hard negatives on domain adaptation
Abstract
Dense retrievers have demonstrated significant potential for neural information retrieval; however, they exhibit a lack of robustness to domain shifts, thereby limiting their efficacy in zero-shot settings across diverse domains. A state-of-the-art domain adaptation technique is Generative Pseudo Labeling (GPL). GPL uses synthetic query generation and initially mined hard negatives to distill knowledge from cross-encoder to dense retrievers in the target domain. In this paper, we analyze the documents retrieved by the domain-adapted model and discover that these are more relevant to the target queries than those of the non-domain-adapted model. We then propose refreshing the hard-negative index during the knowledge distillation phase to mine better hard negatives. Our remining R-GPL approach boosts ranking performance in 13/14 BEIR datasets and 9/12 LoTTe datasets. Our contributions are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Cancer-related molecular mechanisms research
MethodsKnowledge Distillation
