LLM-Assisted Relevance Assessments: When Should We Ask LLMs for Help?

Rikiya Takehi; Ellen M. Voorhees; Tetsuya Sakai; Ian Soboroff

arXiv:2411.06877·cs.IR·July 15, 2025·3 cites

LLM-Assisted Relevance Assessments: When Should We Ask LLMs for Help?

Rikiya Takehi, Ellen M. Voorhees, Tetsuya Sakai, Ian Soboroff

PDF

Open Access 1 Repo

TL;DR

This paper introduces LLM-Assisted Relevance Assessments (LARA), a method that combines human judgment and LLM predictions to efficiently create reliable test collections for information retrieval evaluation, especially under budget constraints.

Contribution

LARA actively calibrates and debiases LLM relevance predictions to optimize manual annotation efforts, improving test collection quality with limited resources.

Findings

01

LARA outperforms alternative methods across multiple datasets.

02

It effectively balances manual and LLM annotations under various budgets.

03

LARA enhances the reliability of test collections with reduced manual effort.

Abstract

Test collections are information-retrieval tools that allow researchers to quickly and easily evaluate ranking algorithms. While test collections have become an integral part of IR research, the process of data creation involves significant manual-annotation effort, which often makes it very expensive and time-consuming. Consequently, test collections can become too small when the budget is limited, which may lead to unstable evaluations. As a cheaper alternative, recent studies have proposed using large language models (LLMs) to completely replace human assessors. However, while LLMs correlate to some extent with human judgments, their predictions are not perfect and often show bias. Thus, a complete replacement with LLMs is considered too risky and not fully reliable. In this paper, we propose LLM-Assisted Relevance Assessments (LARA), an effective method to balance manual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

RikiyaT/LARA
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law

MethodsHigh-Order Consensuses