Improve LLM-as-a-Judge Ability as a General Ability

Jiachen Yu; Shaoning Sun; Xiaohui Hu; Jiaxu Yan; Kaidong Yu; Xuelong Li

arXiv:2502.11689·cs.CL·September 9, 2025·2 cites

Improve LLM-as-a-Judge Ability as a General Ability

Jiachen Yu, Shaoning Sun, Xiaohui Hu, Jiaxu Yan, Kaidong Yu, Xuelong Li

PDF

Open Access 2 Models 2 Datasets 1 Video

TL;DR

This paper presents a two-stage training approach for large language models to improve their ability as general judges, enhancing accuracy and efficiency in evaluating responses across diverse scenarios, with state-of-the-art results.

Contribution

The work introduces a novel two-stage training method combining supervised fine-tuning and preference optimization, along with an efficient data synthesis technique, to improve LLMs' judging capabilities with less data.

Findings

01

Achieves state-of-the-art performance on RewardBench.

02

Requires only 2-40% of data compared to other methods.

03

Enhances downstream policy optimization through improved judge signals.

Abstract

LLM-as-a-Judge leverages the generative and reasoning capabilities of large language models (LLMs) to evaluate LLM responses across diverse scenarios, providing accurate preference signals. This approach plays a vital role in aligning LLMs with human values, ensuring ethical and reliable AI outputs that align with societal norms. Recent studies have raised many methods to train LLM as generative judges, but most of them are data consuming or lack accuracy, and only focus on LLM's judge ability. In this work, we regard judge ability as a general ability of LLM and implement a two-stage training approach, comprising supervised fine-tuning (SFT) warm-up and direct preference optimization (DPO) enhancement, to achieve judge style adaptation and improve judgment accuracy. Additionally, we introduce an efficient data synthesis method to generate judgmental content. Experimental results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

Improve LLM-as-a-Judge Ability as a General Ability· underline

Taxonomy

TopicsLegal Education and Practice Innovations · Artificial Intelligence in Law · Dispute Resolution and Class Actions

MethodsDirect Preference Optimization · ALIGN · Focus