Bias in, Bias out: Annotation Bias in Multilingual Large Language Models

Xia Cui; Ziyi Huang; Naeemeh Adel

arXiv:2511.14662·cs.CL·November 19, 2025

Bias in, Bias out: Annotation Bias in Multilingual Large Language Models

Xia Cui, Ziyi Huang, Naeemeh Adel

PDF

Open Access

TL;DR

This paper analyzes various types of annotation bias in multilingual NLP datasets, proposing a comprehensive framework, detection methods, and mitigation strategies to promote fairer and culturally sensitive large language models.

Contribution

It introduces a detailed typology of annotation bias, reviews detection techniques, and presents an ensemble-based mitigation approach tailored for multilingual contexts.

Findings

01

Identified key sources of annotation bias in multilingual datasets

02

Reviewed effective detection metrics for annotation bias

03

Proposed an ensemble-based bias mitigation method

Abstract

Annotation bias in NLP datasets remains a major challenge for developing multilingual Large Language Models (LLMs), particularly in culturally diverse settings. Bias from task framing, annotator subjectivity, and cultural mismatches can distort model outputs and exacerbate social harms. We propose a comprehensive framework for understanding annotation bias, distinguishing among instruction bias, annotator bias, and contextual and cultural bias. We review detection methods (including inter-annotator agreement, model disagreement, and metadata analysis) and highlight emerging techniques such as multilingual model divergence and cultural inference. We further outline proactive and reactive mitigation strategies, including diverse annotator recruitment, iterative guideline refinement, and post-hoc model adjustments. Our contributions include: (1) a typology of annotation bias; (2) a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Computational and Text Analysis Methods · Artificial Intelligence in Healthcare and Education