Verdict: A Library for Scaling Judge-Time Compute

Nimit Kalra; Leonard Tang

arXiv:2502.18018·cs.CL·November 6, 2025

Verdict: A Library for Scaling Judge-Time Compute

Nimit Kalra, Leonard Tang

PDF

Open Access 1 Repo

TL;DR

Verdict is an open-source library that enhances the accuracy, reliability, and interpretability of LLM-based judges by modular reasoning and increased compute, achieving competitive performance on various evaluation tasks.

Contribution

We introduce Verdict, a modular framework that scales judge-time compute to improve LLM judge quality across multiple evaluation tasks.

Findings

01

Verdict achieves performance comparable to larger fine-tuned judges.

02

It improves reliability and interpretability of automated evaluations.

03

Effective across tasks like moderation, fact-checking, and hallucination detection.

Abstract

The use of LLMs as automated judges ("LLM-as-a-judge") is now widespread, yet standard judges suffer from a multitude of reliability issues. To address these challenges, we introduce Verdict, an open-source library for scaling judge-time compute to enhance the accuracy, reliability, and interpretability of automated evaluators. Verdict leverages the composition of modular reasoning units (such as verification, debate, and aggregation) and increased inference-time compute to improve LLM judge quality. Across a variety of challenging tasks such as content moderation, fact-checking, and hallucination detection, Verdict judges achieves performance competitive with orders-of-magnitude larger fine-tuned judges, prompted judges, and reasoning models. Our framework establishes a foundation for scalable, interpretable, and reliable LLM-based evaluation systems for both researchers and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

haizelabs/verdict
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDispute Resolution and Class Actions · Artificial Intelligence in Law

MethodsLib