Bilingual Bias in Large Language Models: A Taiwan Sovereignty Benchmark Study
Ju-Chun Ko

TL;DR
This study systematically evaluates 17 large language models for bilingual political bias regarding Taiwan's sovereignty, revealing significant language-dependent biases and proposing new metrics for measurement.
Contribution
It introduces a novel bilingual benchmark, metrics for bias quantification, and highlights the severity of language bias in LLMs related to sensitive political topics.
Findings
15 out of 17 models show significant language bias
Chinese-origin models often propagate CCP narratives
GPT-4o Mini performs perfectly in both languages
Abstract
Large Language Models (LLMs) are increasingly deployed in multilingual contexts, yet their consistency across languages on politically sensitive topics remains understudied. This paper presents a systematic bilingual benchmark study examining how 17 LLMs respond to questions concerning the sovereignty of the Republic of China (Taiwan) when queried in Chinese versus English. We discover significant language bias -- the phenomenon where the same model produces substantively different political stances depending on the query language. Our findings reveal that 15 out of 17 tested models exhibit measurable language bias, with Chinese-origin models showing particularly severe issues including complete refusal to answer or explicit propagation of Chinese Communist Party (CCP) narratives. Notably, only GPT-4o Mini achieves a perfect 10/10 score in both languages. We propose novel metrics for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Big Data and Digital Economy · Topic Modeling
