Enhancing Retrieval Performance: An Ensemble Approach For Hard Negative Mining
Hansa Meghwani

TL;DR
This paper introduces a novel ensemble approach for hard negative mining in training cross-encoder models, significantly improving retrieval performance in domain-specific contexts and benefiting advanced LLM systems.
Contribution
It presents a robust hard negative mining technique tailored for enterprise datasets, enhancing the training efficiency and effectiveness of cross-encoder re-rank models.
Findings
Hard negative sampling outperforms random sampling in model training.
Learning both similarity and dissimilarity improves retrieval accuracy.
The approach benefits large language model systems like RAG and ReAct.
Abstract
Ranking consistently emerges as a primary focus in information retrieval research. Retrieval and ranking models serve as the foundation for numerous applications, including web search, open domain QA, enterprise domain QA, and text-based recommender systems. Typically, these models undergo training on triplets consisting of binary relevance assignments, comprising one positive and one negative passage. However, their utilization involves a context where a significantly more nuanced understanding of relevance is necessary, especially when re-ranking a large pool of potentially relevant passages. Although collecting positive examples through user feedback like impressions or clicks is straightforward, identifying suitable negative pairs from a vast pool of possibly millions or even billions of documents possess a greater challenge. Generating a substantial number of negative pairs is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRough Sets and Fuzzy Logic · Imbalanced Data Classification Techniques · Data Mining Algorithms and Applications
MethodsFocus
