ECI: Effective Contrastive Information to Evaluate Hard-Negatives
Aarush Sinha, Rahul Seetharaman, Aman Bansal

TL;DR
This paper introduces ECI, a theoretically grounded metric based on Information Theory, to evaluate the quality of hard negatives in dense retrieval models, reducing the need for costly ablation studies.
Contribution
ECI provides a novel, principled method to assess hard negatives before fine-tuning, improving efficiency and accuracy in dense retrieval training.
Findings
ECI accurately predicts downstream retrieval performance.
Hybrid negative sampling strategies balance volume and reliability.
ECI reduces the need for extensive ablation studies.
Abstract
Hard negatives play a critical role in training and fine-tuning dense retrieval models, as they are semantically similar to positive documents yet non-relevant, and correctly distinguishing them is essential for improving retrieval accuracy. However, identifying effective hard negatives typically requires extensive ablation studies involving repeated fine-tuning with different negative sampling strategies and hyperparameters, resulting in substantial computational cost. In this paper, we introduce ECI: Effective Contrastive Information , a theoretically grounded metric grounded in Information Theory and Information Retrieval principles that enables practitioners to assess the quality of hard negatives prior to model fine-tuning. ECI evaluates negatives by optimizing the trade-off between Information Capacity the logarithmic bound on mutual information determined by set size and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Topic Modeling · Memory Processes and Influences
