Fairness or Fluency? An Investigation into Language Bias of Pairwise LLM-as-a-Judge

Xiaolin Zhou; Zheng Luo; Yicheng Gao; Qixuan Chen; Xiyang Hu; Yue Zhao; Ruishan Liu

arXiv:2601.13649·cs.CL·January 21, 2026

Fairness or Fluency? An Investigation into Language Bias of Pairwise LLM-as-a-Judge

Xiaolin Zhou, Zheng Luo, Yicheng Gao, Qixuan Chen, Xiyang Hu, Yue Zhao, Ruishan Liu

PDF

Open Access

TL;DR

This paper investigates language bias in Large Language Model-based judges, revealing significant disparities across languages and showing that bias is not solely due to perplexity, highlighting challenges in fair AI evaluation.

Contribution

The study systematically analyzes language bias in LLM-as-a-judge, identifying performance disparities and the influence of answer language, and explores the relationship with perplexity bias.

Findings

01

European languages outperform African languages in same-language judging

02

Models favor English answers in inter-language comparisons

03

Language bias is only partially explained by perplexity

Abstract

Recent advances in Large Language Models (LLMs) have incentivized the development of LLM-as-a-judge, an application of LLMs where they are used as judges to decide the quality of a certain piece of text given a certain context. However, previous studies have demonstrated that LLM-as-a-judge can be biased towards different aspects of the judged texts, which often do not align with human preference. One of the identified biases is language bias, which indicates that the decision of LLM-as-a-judge can differ based on the language of the judged texts. In this paper, we study two types of language bias in pairwise LLM-as-a-judge: (1) performance disparity between languages when the judge is prompted to compare options from the same language, and (2) bias towards options written in major languages when the judge is prompted to compare options of two different languages. We find that for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law · Topic Modeling · Computational and Text Analysis Methods