On Evaluating LLM Alignment by Evaluating LLMs as Judges

Yixin Liu; Pengfei Liu; Arman Cohan

arXiv:2511.20604·cs.CL·November 26, 2025

On Evaluating LLM Alignment by Evaluating LLMs as Judges

Yixin Liu, Pengfei Liu, Arman Cohan

PDF

Open Access 1 Video

TL;DR

This paper explores how large language models (LLMs) can be evaluated for alignment with human preferences by analyzing their evaluation capabilities, proposing a new benchmark that assesses alignment through evaluation performance rather than direct output assessment.

Contribution

It introduces AlignEval, a benchmark that measures LLM alignment by evaluating their judging ability, revealing a strong correlation between generation and evaluation capabilities of LLMs.

Findings

01

AlignEval matches or exceeds existing benchmarks in capturing human preferences.

02

Strong correlation found between LLMs' generation and evaluation capabilities.

03

Proposes a new paradigm for assessing LLM alignment without direct output evaluation.

Abstract

Alignment with human preferences is an important evaluation aspect of LLMs, requiring them to be helpful, honest, safe, and to precisely follow human instructions. Evaluating large language models' (LLMs) alignment typically involves directly assessing their open-ended responses, requiring human annotators or strong LLM judges. Conversely, LLMs themselves have also been extensively evaluated as judges for assessing alignment. In this work, we examine the relationship between LLMs' generation and evaluation capabilities in aligning with human preferences. To this end, we first conduct a comprehensive analysis of the generation-evaluation consistency (GE-consistency) among various LLMs, revealing a strong correlation between their generation and evaluation capabilities when evaluated by a strong LLM preference oracle. Utilizing this finding, we propose a benchmarking paradigm that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

On Evaluating LLM Alignment by Evaluating LLMs as Judges· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification