JudgeBlender: Ensembling Judgments for Automatic Relevance Assessment

Hossein A. Rahmani; Emine Yilmaz; Nick Craswell; Bhaskar Mitra

arXiv:2412.13268·cs.IR·December 19, 2024·2 cites

JudgeBlender: Ensembling Judgments for Automatic Relevance Assessment

Hossein A. Rahmani, Emine Yilmaz, Nick Craswell, Bhaskar Mitra

PDF

Open Access 1 Repo

TL;DR

JudgeBlender is a framework that combines multiple smaller open-source models and prompts to generate reliable relevance judgments, reducing reliance on expensive large LLMs and improving efficiency in search system evaluation.

Contribution

It introduces a novel ensembling approach using open-source models and prompts for relevance assessment, challenging the need for large LLMs like GPT-4.

Findings

01

JudgeBlender achieves competitive performance on the LLMJudge benchmark.

02

Ensembling multiple models or prompts improves relevance judgment reliability.

03

Smaller models can effectively replace large LLMs for evaluation tasks.

Abstract

The effective training and evaluation of retrieval systems require a substantial amount of relevance judgments, which are traditionally collected from human assessors -- a process that is both costly and time-consuming. Large Language Models (LLMs) have shown promise in generating relevance labels for search tasks, offering a potential alternative to manual assessments. Current approaches often rely on a single LLM, such as GPT-4, which, despite being effective, are expensive and prone to intra-model biases that can favour systems leveraging similar models. In this work, we introduce JudgeBlender, a framework that employs smaller, open-source models to provide relevance judgments by combining evaluations across multiple LLMs (LLMBlender) or multiple prompts (PromptBlender). By leveraging the LLMJudge benchmark [18], we compare JudgeBlender with state-of-the-art methods and the top…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rahmanidashti/judgeblender
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning

MethodsAttention Is All You Need · Linear Layer · Dropout · Multi-Head Attention · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Adam · Layer Normalization · Softmax