How important is Recall for Measuring Retrieval Quality?
Shelly Schwartz, Oleg Vasilyev, Randy Sawaya

TL;DR
This paper investigates the importance of recall in retrieval quality measurement, proposing a new metric that performs well without knowing the total relevant documents, validated through experiments with LLM-based judgments.
Contribution
It evaluates existing strategies for measuring retrieval quality without recall and introduces a simple, effective measure applicable in realistic, large-scale settings.
Findings
Correlation between retrieval metrics and LLM judgments varies across datasets.
The proposed measure performs well without knowing total relevant documents.
Experiments demonstrate the effectiveness of the new retrieval quality measure.
Abstract
In realistic retrieval settings with large and evolving knowledge bases, the total number of documents relevant to a query is typically unknown, and recall cannot be computed. In this paper, we evaluate several established strategies for handling this limitation by measuring the correlation between retrieval quality metrics and LLM-based judgments of response quality, where responses are generated from the retrieved documents. We conduct experiments across multiple datasets with a relatively low number of relevant documents (2-15). We also introduce a simple retrieval quality measure that performs well without requiring knowledge of the total number of relevant documents.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
