Enhancing Retrieval Performance: An Ensemble Approach For Hard Negative   Mining

Hansa Meghwani

arXiv:2411.02404·cs.IR·December 16, 2024

Enhancing Retrieval Performance: An Ensemble Approach For Hard Negative Mining

Hansa Meghwani

PDF

Open Access

TL;DR

This paper introduces a novel ensemble approach for hard negative mining in training cross-encoder models, significantly improving retrieval performance in domain-specific contexts and benefiting advanced LLM systems.

Contribution

It presents a robust hard negative mining technique tailored for enterprise datasets, enhancing the training efficiency and effectiveness of cross-encoder re-rank models.

Findings

01

Hard negative sampling outperforms random sampling in model training.

02

Learning both similarity and dissimilarity improves retrieval accuracy.

03

The approach benefits large language model systems like RAG and ReAct.

Abstract

Ranking consistently emerges as a primary focus in information retrieval research. Retrieval and ranking models serve as the foundation for numerous applications, including web search, open domain QA, enterprise domain QA, and text-based recommender systems. Typically, these models undergo training on triplets consisting of binary relevance assignments, comprising one positive and one negative passage. However, their utilization involves a context where a significantly more nuanced understanding of relevance is necessary, especially when re-ranking a large pool of potentially relevant passages. Although collecting positive examples through user feedback like impressions or clicks is straightforward, identifying suitable negative pairs from a vast pool of possibly millions or even billions of documents possess a greater challenge. Generating a substantial number of negative pairs is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRough Sets and Fuzzy Logic · Imbalanced Data Classification Techniques · Data Mining Algorithms and Applications

MethodsFocus