MBBQ: A Dataset for Cross-Lingual Comparison of Stereotypes in Generative LLMs
Vera Neplenbroek, Arianna Bisazza, Raquel Fern\'andez

TL;DR
This paper introduces MBBQ, a multilingual dataset for comparing stereotypes in generative LLMs across languages, revealing biases vary with language and model accuracy, highlighting the need for bias mitigation in multilingual AI.
Contribution
The paper presents MBBQ, a new multilingual bias benchmark extending the BBQ dataset to Dutch, Spanish, and Turkish, enabling cross-lingual bias analysis in LLMs.
Findings
Bias varies across languages, often more in non-English.
Significant cross-lingual differences in bias behavior.
Bias correlates with model accuracy.
Abstract
Generative large language models (LLMs) have been shown to exhibit harmful biases and stereotypes. While safety fine-tuning typically takes place in English, if at all, these models are being used by speakers of many different languages. There is existing evidence that the performance of these models is inconsistent across languages and that they discriminate based on demographic factors of the user. Motivated by this, we investigate whether the social stereotypes exhibited by LLMs differ as a function of the language used to prompt them, while controlling for cultural differences and task accuracy. To this end, we present MBBQ (Multilingual Bias Benchmark for Question-answering), a carefully curated version of the English BBQ dataset extended to Dutch, Spanish, and Turkish, which measures stereotypes commonly held across these languages. We further complement MBBQ with a parallel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Wikis in Education and Collaboration · Text Readability and Simplification
