Criteria-Based LLM Relevance Judgments
Naghmeh Farzi, Laura Dietz

TL;DR
This paper introduces a Multi-Criteria framework for LLM-based relevance judgments, decomposing relevance into multiple criteria to improve robustness, interpretability, and system evaluation in information retrieval.
Contribution
The paper proposes a novel Multi-Criteria approach for LLM relevance judgments, enhancing interpretability and evaluation robustness over traditional direct grading methods.
Findings
Multi-Criteria judgments improve system ranking performance.
The approach offers better interpretability of relevance assessments.
Strengths and limitations of the method are analyzed.
Abstract
Relevance judgments are crucial for evaluating information retrieval systems, but traditional human-annotated labels are time-consuming and expensive. As a result, many researchers turn to automatic alternatives to accelerate method development. Among these, Large Language Models (LLMs) provide a scalable solution by generating relevance labels directly through prompting. However, prompting an LLM for a relevance label without constraints often results in not only incorrect predictions but also outputs that are difficult for humans to interpret. We propose the Multi-Criteria framework for LLM-based relevance judgments, decomposing the notion of relevance into multiple criteria--such as exactness, coverage, topicality, and contextual fit--to improve the robustness and interpretability of retrieval evaluations compared to direct grading methods. We validate this approach on three…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
