Delving into Multilingual Ethical Bias: The MSQAD with Statistical Hypothesis Tests for Large Language Models

Seunguk Yu; Juhwan Choi; Youngbin Kim

arXiv:2505.19121·cs.CL·July 3, 2025

Delving into Multilingual Ethical Bias: The MSQAD with Statistical Hypothesis Tests for Large Language Models

Seunguk Yu, Juhwan Choi, Youngbin Kim

PDF

1 Repo

TL;DR

This paper investigates ethical biases in large language models across multiple languages using the newly created MSQAD dataset and statistical hypothesis tests, revealing widespread cross-language biases.

Contribution

Introduces the MSQAD dataset for multilingual bias analysis and applies statistical tests to demonstrate the prevalence of ethical biases across languages and models.

Findings

01

Biases are prevalent across different languages and topics.

02

Cross-language differences significantly contribute to ethical biases.

03

Biases are consistent across various large language models.

Abstract

Despite the recent strides in large language models, studies have underscored the existence of social biases within these systems. In this paper, we delve into the validation and comparison of the ethical biases of LLMs concerning globally discussed and potentially sensitive topics, hypothesizing that these biases may arise from language-specific distinctions. Introducing the Multilingual Sensitive Questions & Answers Dataset (MSQAD), we collected news articles from Human Rights Watch covering 17 topics, and generated socially sensitive questions along with corresponding responses in multiple languages. We scrutinized the biases of these responses across languages and topics, employing two statistical hypothesis tests. The results showed that the null hypotheses were rejected in most cases, indicating biases arising from cross-language differences. It demonstrates that ethical biases in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

seungukyu/msqad
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.