Tuning LLM Judge Design Decisions for 1/1000 of the Cost

David Salinas; Omar Swelam; Frank Hutter

arXiv:2501.17178·cs.CL·May 28, 2025

Tuning LLM Judge Design Decisions for 1/1000 of the Cost

David Salinas, Omar Swelam, Frank Hutter

PDF

Open Access

TL;DR

This paper introduces a systematic approach to optimize LLM-based judges by tuning hyperparameters using multi-objective multi-fidelity methods, achieving better accuracy-cost trade-offs and utilizing open models for accessible evaluation.

Contribution

It presents a novel hyperparameter tuning method for LLM judges that reduces evaluation costs and improves performance using multi-objective multi-fidelity optimization.

Findings

01

Identified judges outperform existing benchmarks in accuracy and cost-efficiency.

02

Utilized open-weight models to enhance accessibility and reproducibility.

03

Reduced evaluation costs significantly through multi-fidelity methods.

Abstract

Evaluating Large Language Models (LLMs) often requires costly human annotations. To address this, LLM-based judges have been proposed, which compare the outputs of two LLMs enabling the ranking of models without human intervention. While several approaches have been proposed, many confounding factors are present between different papers. For instance the model, the prompt and other hyperparameters are typically changed at the same time making apple-to-apple comparisons challenging. In this paper, we propose to systematically analyze and tune the hyperparameters of LLM judges. To alleviate the high cost of evaluating a judge, we propose to leverage multi-objective multi-fidelity which allows to find judges that trade accuracy for cost and also significantly reduce the cost of the search. Our method identifies judges that not only outperform existing benchmarks in accuracy and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDispute Resolution and Class Actions