TL;DR
This paper introduces LAGER, a plug-and-play framework that enhances LLMs' alignment with human judgments by utilizing internal representations across layers, without fine-tuning or altering inference, leading to improved evaluation accuracy.
Contribution
LAGER leverages cross-layer internal representations to improve LLM-as-a-judge alignment, outperforming existing methods without fine-tuning or reasoning steps.
Findings
LAGER improves Spearman correlation by up to 7.5% on benchmarks.
It matches or exceeds reasoning-based methods without reasoning steps.
Demonstrates strong generalization in downstream tasks.
Abstract
The growing scale of evaluation tasks has led to the widespread adoption of automated evaluation using LLMs, a paradigm known as "LLM-as-a-judge". However, improving its alignment with human preferences without complex prompts or fine-tuning remains challenging. Previous studies mainly optimize based on shallow outputs, overlooking rich cross-layer representations. In this work, motivated by preliminary findings that middle-to-upper layers encode semantically and task-relevant representations that are often more aligned with human judgments than the final layer, we propose LAGER, a post-hoc, plug-and-play framework for improving the alignment of LLM-as-a-Judge point-wise evaluations with human scores by leveraging internal representations. LAGER produces fine-grained judgment scores by aggregating cross-layer score-token logits and computing the expected score from a softmax-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
