Negative Sampling Techniques in Information Retrieval: A Survey
Laurin Wischounig, Abdelrahman Abdallah, Adam Jatowt

TL;DR
This survey reviews negative sampling techniques in dense information retrieval, emphasizing recent LLM-driven methods, and categorizes approaches based on their effectiveness, cost, and complexity to guide future research.
Contribution
It provides the first comprehensive overview focusing on modern NLP and LLM-based negative sampling methods in dense IR, with a new taxonomy and analysis of trade-offs.
Findings
LLM-driven synthetic negative sampling shows promising results.
Trade-offs exist between effectiveness and computational cost.
Current challenges include data quality and scalability.
Abstract
Information Retrieval (IR) is fundamental to many modern NLP applications. The rise of dense retrieval (DR), using neural networks to learn semantic vector representations, has significantly advanced IR performance. Central to training effective dense retrievers through contrastive learning is the selection of informative negative samples. Synthesizing 35 seminal papers, this survey provides a comprehensive and up-to-date overview of negative sampling techniques in dense IR. Our unique contribution is the focus on modern NLP applications and the inclusion of recent Large Language Model (LLM)-driven methods, an area absent in prior reviews. We propose a taxonomy that categorizes techniques including random, static/dynamically mined, and synthetic datasets. We then analyze these approaches with respect to trade-offs between effectiveness, computational cost, and implementation difficulty.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Expert finding and Q&A systems · Text and Document Classification Technologies
