CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution
Maosong Cao, Alexander Lam, Haodong Duan, Hongwei Liu, Songyang Zhang,, Kai Chen

TL;DR
CompassJudger-1 is an all-in-one open-source judge LLM designed to improve the accuracy, versatility, and reproducibility of model evaluations, supporting various assessment formats and tasks to advance LLM development.
Contribution
It introduces CompassJudger-1, the first versatile open-source judge model capable of multiple evaluation tasks, and establishes JudgerBench, a comprehensive benchmark for subjective evaluation tasks.
Findings
CompassJudger-1 demonstrates high versatility across evaluation tasks.
JudgerBench provides a unified platform for assessing judge models.
Open-sourcing accelerates research in LLM evaluation methodologies.
Abstract
Efficient and accurate evaluation is crucial for the continuous improvement of large language models (LLMs). Among various assessment methods, subjective evaluation has garnered significant attention due to its superior alignment with real-world usage scenarios and human preferences. However, human-based evaluations are costly and lack reproducibility, making precise automated evaluators (judgers) vital in this process. In this report, we introduce \textbf{CompassJudger-1}, the first open-source \textbf{all-in-one} judge LLM. CompassJudger-1 is a general-purpose LLM that demonstrates remarkable versatility. It is capable of: 1. Performing unitary scoring and two-model comparisons as a reward model; 2. Conducting evaluations according to specified formats; 3. Generating critiques; 4. Executing diverse tasks like a general LLM. To assess the evaluation capabilities of different judge…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗opencompass/CompassJudger-1-7B-Instructmodel· 161 dl· ♡ 10161 dl♡ 10
- 🤗opencompass/CompassJudger-1-32B-Instructmodel· 19 dl· ♡ 1819 dl♡ 18
- 🤗opencompass/CompassJudger-1-14B-Instructmodel· 82 dl· ♡ 282 dl♡ 2
- 🤗opencompass/CompassJudger-1-1.5B-Instructmodel· 21 dl· ♡ 121 dl♡ 1
- 🤗KnutJaegersberg/CompassJudger-1-32B-Instruct-exl2-8.0bpwmodel· 1 dl1 dl
- 🤗KnutJaegersberg/CompassJudger-1-14B-Instruct-exl2-8.0bpwmodel
- 🤗RichardErkhov/opencompass_-_CompassJudger-1-32B-Instruct-ggufmodel· 216 dl216 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation Techniques and Applications · Scientific Computing and Data Management · Evolutionary Algorithms and Applications
MethodsSoftmax · Attention Is All You Need
