Exploring the Effects of Alignment on Numerical Bias in Large Language Models
Ayako Sato, Hwichan Kim, Zhousi Chen, Masato Mita, Mamoru Komachi

TL;DR
This paper investigates how alignment techniques in large language models cause numerical bias in evaluation scores, and proposes score range adjustment as an effective mitigation strategy.
Contribution
It identifies the link between alignment and increased numerical bias in LLM evaluators and evaluates mitigation methods, highlighting score range adjustment as most effective.
Findings
Alignment increases numerical bias in LLM evaluators.
Score range adjustment reduces bias and improves evaluation performance.
Mitigation strategies need further refinement for robustness.
Abstract
"LLM-as-a-judge," which utilizes large language models (LLMs) as evaluators, has proven effective in many evaluation tasks. However, evaluator LLMs exhibit numerical bias, a phenomenon where certain evaluation scores are generated disproportionately often, leading reduced evaluation performance. This study investigates the cause of this bias. Given that most evaluator LLMs are aligned through instruction tuning and preference tuning, and that prior research suggests alignment reduces output diversity, we hypothesize that numerical bias arises from alignment. To test this, we compare outputs from pre- and post-alignment LLMs, and observe that alignment indeed increases numerical bias. We also explore mitigation strategies for post-alignment LLMs, including temperature scaling, distribution calibration, and score range adjustment. Among these, score range adjustment is most effective in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods · Explainable Artificial Intelligence (XAI)
