Criteria-Based LLM Relevance Judgments

Naghmeh Farzi; Laura Dietz

arXiv:2507.09488·cs.IR·July 15, 2025

Criteria-Based LLM Relevance Judgments

Naghmeh Farzi, Laura Dietz

PDF

TL;DR

This paper introduces a Multi-Criteria framework for LLM-based relevance judgments, decomposing relevance into multiple criteria to improve robustness, interpretability, and system evaluation in information retrieval.

Contribution

The paper proposes a novel Multi-Criteria approach for LLM relevance judgments, enhancing interpretability and evaluation robustness over traditional direct grading methods.

Findings

01

Multi-Criteria judgments improve system ranking performance.

02

The approach offers better interpretability of relevance assessments.

03

Strengths and limitations of the method are analyzed.

Abstract

Relevance judgments are crucial for evaluating information retrieval systems, but traditional human-annotated labels are time-consuming and expensive. As a result, many researchers turn to automatic alternatives to accelerate method development. Among these, Large Language Models (LLMs) provide a scalable solution by generating relevance labels directly through prompting. However, prompting an LLM for a relevance label without constraints often results in not only incorrect predictions but also outputs that are difficult for humans to interpret. We propose the Multi-Criteria framework for LLM-based relevance judgments, decomposing the notion of relevance into multiple criteria--such as exactness, coverage, topicality, and contextual fit--to improve the robustness and interpretability of retrieval evaluations compared to direct grading methods. We validate this approach on three…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.