How important is Recall for Measuring Retrieval Quality?

Shelly Schwartz; Oleg Vasilyev; Randy Sawaya

arXiv:2512.20854·cs.CL·May 8, 2026

How important is Recall for Measuring Retrieval Quality?

Shelly Schwartz, Oleg Vasilyev, Randy Sawaya

PDF

1 Datasets

TL;DR

This paper investigates the importance of recall in retrieval quality measurement, proposing a new metric that performs well without knowing the total relevant documents, validated through experiments with LLM-based judgments.

Contribution

It evaluates existing strategies for measuring retrieval quality without recall and introduces a simple, effective measure applicable in realistic, large-scale settings.

Findings

01

Correlation between retrieval metrics and LLM judgments varies across datasets.

02

The proposed measure performs well without knowing total relevant documents.

03

Experiments demonstrate the effectiveness of the new retrieval quality measure.

Abstract

In realistic retrieval settings with large and evolving knowledge bases, the total number of documents relevant to a query is typically unknown, and recall cannot be computed. In this paper, we evaluate several established strategies for handling this limitation by measuring the correlation between retrieval quality metrics and LLM-based judgments of response quality, where responses are generated from the retrieved documents. We conduct experiments across multiple datasets with a relatively low number of relevant documents (2-15). We also introduce a simple retrieval quality measure that performs well without requiring knowledge of the total number of relevant documents.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

primer-ai/retrieval-response
dataset· 39 dl
39 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.