TL;DR
This paper introduces UDA, an unsupervised framework that reduces bias in pairwise LLM evaluations by aligning judges' ratings towards a collective consensus, improving consistency and correlation with human judgments.
Contribution
UDA is a novel unsupervised method that dynamically adjusts judge ratings to minimize disagreement, enhancing evaluation stability without requiring labeled data.
Findings
Reduces inter-judge rating standard deviation by up to 63.4%.
Improves average correlation with human judgments by 24.7%.
Elevates performance of weaker judges to match stronger ones.
Abstract
Pairwise evaluation of Large Language Models (LLMs) is a common paradigm, but it is prone to preference bias, where judges systematically favor certain outputs, such as their own. This bias leads to inconsistent and skewed rankings across different judges. To address this, we first empirically demonstrate significant and heterogeneous biases in cross-model evaluations. We then propose UDA (Unsupervised Debiasing Alignment), a framework that reduces inter-judge disagreement by dynamically adjusting the Elo rating system. For each pairwise comparison, a compact neural network learns to adaptively set the K-factor and refine win probabilities. Crucially, UDA operates in a fully unsupervised manner, guided solely by the objective of minimizing the dispersion among the Elo trajectories of all judges. This forces an alignment towards a collective consensus, which serves as an unsupervised…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
