Hard Negative Mining for Domain-Specific Retrieval in Enterprise Systems
Hansa Meghwani, Amit Agarwal, Priyaranjan Pattnayak, Hitesh Laxmichand Patel, Srikant Panda

TL;DR
This paper introduces a scalable hard-negative mining framework for enterprise search that improves retrieval accuracy by selecting challenging negative examples, demonstrating significant performance gains on proprietary and public datasets.
Contribution
We propose a novel, efficient hard-negative mining method tailored for domain-specific enterprise data, enhancing re-ranking models and improving retrieval metrics.
Findings
15% improvement in MRR@3 on enterprise data
19% improvement in MRR@10 on enterprise data
Effective on both proprietary and public datasets
Abstract
Enterprise search systems often struggle to retrieve accurate, domain-specific information due to semantic mismatches and overlapping terminologies. These issues can degrade the performance of downstream applications such as knowledge management, customer support, and retrieval-augmented generation agents. To address this challenge, we propose a scalable hard-negative mining framework tailored specifically for domain-specific enterprise data. Our approach dynamically selects semantically challenging but contextually irrelevant documents to enhance deployed re-ranking models. Our method integrates diverse embedding models, performs dimensionality reduction, and uniquely selects hard negatives, ensuring computational efficiency and semantic precision. Evaluation on our proprietary enterprise corpus (cloud services domain) demonstrates substantial improvements of 15\% in MRR@3 and 19\%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsIs Venmo Customer Support Available 24/7? How to Reach a Real Person
