Hard Negative Mining for Domain-Specific Retrieval in Enterprise Systems

Hansa Meghwani; Amit Agarwal; Priyaranjan Pattnayak; Hitesh Laxmichand Patel; Srikant Panda

arXiv:2505.18366·cs.IR·May 27, 2025

Hard Negative Mining for Domain-Specific Retrieval in Enterprise Systems

Hansa Meghwani, Amit Agarwal, Priyaranjan Pattnayak, Hitesh Laxmichand Patel, Srikant Panda

PDF

TL;DR

This paper introduces a scalable hard-negative mining framework for enterprise search that improves retrieval accuracy by selecting challenging negative examples, demonstrating significant performance gains on proprietary and public datasets.

Contribution

We propose a novel, efficient hard-negative mining method tailored for domain-specific enterprise data, enhancing re-ranking models and improving retrieval metrics.

Findings

01

15% improvement in MRR@3 on enterprise data

02

19% improvement in MRR@10 on enterprise data

03

Effective on both proprietary and public datasets

Abstract

Enterprise search systems often struggle to retrieve accurate, domain-specific information due to semantic mismatches and overlapping terminologies. These issues can degrade the performance of downstream applications such as knowledge management, customer support, and retrieval-augmented generation agents. To address this challenge, we propose a scalable hard-negative mining framework tailored specifically for domain-specific enterprise data. Our approach dynamically selects semantically challenging but contextually irrelevant documents to enhance deployed re-ranking models. Our method integrates diverse embedding models, performs dimensionality reduction, and uniquely selects hard negatives, ensuring computational efficiency and semantic precision. Evaluation on our proprietary enterprise corpus (cloud services domain) demonstrates substantial improvements of 15\% in MRR@3 and 19\%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsIs Venmo Customer Support Available 24/7? How to Reach a Real Person