Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgments
Han Zhou, Xingchen Wan, Yinhong Liu, Nigel Collier, Ivan Vuli\'c, Anna, Korhonen

TL;DR
This paper identifies biases in LLM-based evaluators and introduces ZEPO, a zero-shot prompt optimization method that enhances fairness and alignment with human judgments in language quality assessments.
Contribution
The paper proposes ZEPO, a novel zero-shot prompt optimization framework that improves fairness and human alignment of LLM evaluators without needing labeled data.
Findings
ZEPO significantly outperforms state-of-the-art evaluators.
Fairer preferences lead to better human alignment.
ZEPO requires no labeled data for optimization.
Abstract
Large language models (LLMs) have shown promising abilities as cost-effective and reference-free evaluators for assessing language generation quality. In particular, pairwise LLM evaluators, which compare two generated texts and determine the preferred one, have been employed in a wide range of applications. However, LLMs exhibit preference biases and worrying sensitivity to prompt designs. In this work, we first reveal that the predictive preference of LLMs can be highly brittle and skewed, even with semantically equivalent instructions. We find that fairer predictive preferences from LLMs consistently lead to judgments that are better aligned with humans. Motivated by this phenomenon, we propose an automatic Zero-shot Evaluation-oriented Prompt Optimization framework, ZEPO, which aims to produce fairer preference decisions and improve the alignment of LLM evaluators with human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
