One-Shot Labeling for Automatic Relevance Estimation
Sean MacAvaney, Luca Soldaini

TL;DR
This paper introduces One-Shot Labelers that leverage large language models to predict relevance of unjudged documents, significantly improving offline search system evaluations and statistical reliability.
Contribution
It presents novel methods for filling relevance assessment holes using large language models, enhancing evaluation accuracy and confidence in search system comparisons.
Findings
Predictions of 1SL often disagree with human assessments.
1SL labels produce more reliable system rankings.
System ranking correlations exceed 0.86 with full rankings.
Abstract
Dealing with unjudged documents ("holes") in relevance assessments is a perennial problem when evaluating search systems with offline experiments. Holes can reduce the apparent effectiveness of retrieval systems during evaluation and introduce biases in models trained with incomplete data. In this work, we explore whether large language models can help us fill such holes to improve offline evaluations. We examine an extreme, albeit common, evaluation setting wherein only a single known relevant document per query is available for evaluation. We then explore various approaches for predicting the relevance of unjudged documents with respect to a query and the known relevant document, including nearest neighbor, supervised, and prompting techniques. We find that although the predictions of these One-Shot Labelers (1SL) frequently disagree with human assessments, the labels they produce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
