Can We Use Large Language Models to Fill Relevance Judgment Holes?

Zahra Abbasiantaeb; Chuan Meng; Leif Azzopardi; Mohammad; Aliannejadi

arXiv:2405.05600·cs.IR·May 10, 2024·3 cites

Can We Use Large Language Models to Fill Relevance Judgment Holes?

Zahra Abbasiantaeb, Chuan Meng, Leif Azzopardi, Mohammad, Aliannejadi

PDF

Open Access 2 Repos

TL;DR

This paper investigates using Large Language Models to fill relevance judgment gaps in test collections, especially in dynamic contexts like conversational search, and evaluates their impact on ranking consistency.

Contribution

It introduces a method to extend test collections with LLM-generated relevance judgments grounded in human judgments, highlighting challenges and considerations for alignment.

Findings

01

LLM-based automatic judgments show lower correlation with human judgments.

02

Using LLMs on the entire document pool yields more consistent rankings.

03

The effect of LLM judgments varies depending on the model and hole size.

Abstract

Incomplete relevance judgments limit the re-usability of test collections. When new systems are compared against previous systems used to build the pool of judged documents, they often do so at a disadvantage due to the ``holes'' in test collection (i.e., pockets of un-assessed documents returned by the new system). In this paper, we take initial steps towards extending existing test collections by employing Large Language Models (LLM) to fill the holes by leveraging and grounding the method using existing human judgments. We explore this problem in the context of Conversational Search using TREC iKAT, where information needs are highly dynamic and the responses (and, the results retrieved) are much more varied (leaving bigger holes). While previous work has shown that automatic judgments from LLMs result in highly correlated rankings, we find substantially lower correlates when human…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsALIGN