Reliable Evaluation Protocol for Low-Precision Retrieval
Kisu Yang, Yoonna Jang, Hwanseok Jang, Kenneth Choi, Isabelle Augenstein, Heuiseok Lim

TL;DR
This paper introduces a robust evaluation protocol for low-precision retrieval systems that reduces score variability caused by ties, ensuring more reliable performance assessment.
Contribution
It proposes High-Precision Scoring and Tie-aware Retrieval Metrics to improve the stability and accuracy of evaluation in low-precision retrieval models.
Findings
HPS significantly reduces tie-induced instability.
TRM accurately estimates expected metric values.
The combined approach enhances evaluation consistency.
Abstract
Lowering the numerical precision of model parameters and computations is widely adopted to improve the efficiency of retrieval systems. However, when computing relevance scores between the query and documents in low-precision, we observe spurious ties due to the reduced granularity. This introduces high variability in the results based on tie resolution, making the evaluation less reliable. To address this, we propose a more robust retrieval evaluation protocol designed to reduce score variation. It consists of: (1) High-Precision Scoring (HPS), which upcasts the final scoring step to higher precision to resolve tied candidates with minimal computational cost; and (2) Tie-aware Retrieval Metrics (TRM), which report expected scores, range, and bias to quantify order uncertainty of tied candidates. Our experiments test multiple models with three scoring functions on two retrieval datasets…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
