CompassJudger-2: Towards Generalist Judge Model via Verifiable Rewards

Taolin Zhang; Maosong Cao; Alexander Lam; Songyang Zhang; Kai Chen

arXiv:2507.09104·cs.CL·July 15, 2025

CompassJudger-2: Towards Generalist Judge Model via Verifiable Rewards

Taolin Zhang, Maosong Cao, Alexander Lam, Songyang Zhang, Kai Chen

PDF

1 Repo 2 Models

TL;DR

CompassJudger-2 is a versatile, robust judge model trained with verifiable rewards, achieving high accuracy across multiple domains and setting new standards for LLM evaluation.

Contribution

The paper introduces CompassJudger-2, a generalist judge model with a novel training strategy and a comprehensive benchmark for cross-domain judgment evaluation.

Findings

01

Achieves superior performance on multiple judge and reward benchmarks.

02

Demonstrates competitive accuracy with larger models like DeepSeek-V3 and Qwen3-235B-A22B.

03

Proposes JudgerBenchV2 for standardized evaluation of judge models.

Abstract

Recently, the role of LLM-as-judge in evaluating large language models has gained prominence. However, current judge models suffer from narrow specialization and limited robustness, undermining their capacity for comprehensive evaluations. In this work, we present CompassJudger-2, a novel generalist judge model that overcomes these limitations via a task-driven, multi-domain data curation strategy. Central to our approach is supervising judgment tasks with verifiable rewards, guiding intrinsic critical reasoning through rejection sampling to foster robust, generalizable judgment capabilities. We introduce a refined learning objective with margin policy gradient loss to enhance performance. Empirically, CompassJudger-2 achieves superior results across multiple judge and reward benchmarks, and our 7B model demonstrates competitive judgment accuracy with significantly larger models like…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

open-compass/compassjudger
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.