Textual Data Bias Detection and Mitigation -- An Extensible Pipeline with Experimental Evaluation

Rebekka G\"orge; Sujan Sai Gannamaneni; Tabea Naeven; Hammam Abdelwahab; H\'ector Allende-Cid; Armin B. Cremers; Lennard Helmer; Michael Mock; Anna Schmitz; Songkai Xue; Elif Yildirir; Maximilian Poretschkin; Stefan Wrobel

arXiv:2512.10734·cs.CL·December 15, 2025

Textual Data Bias Detection and Mitigation -- An Extensible Pipeline with Experimental Evaluation

Rebekka G\"orge, Sujan Sai Gannamaneni, Tabea Naeven, Hammam Abdelwahab, H\'ector Allende-Cid, Armin B. Cremers, Lennard Helmer, Michael Mock, Anna Schmitz, Songkai Xue, Elif Yildirir, Maximilian Poretschkin, Stefan Wrobel

PDF

Open Access

TL;DR

This paper presents an extensible pipeline for detecting and mitigating textual data bias, including representation bias and stereotypes, to improve fairness in large language models, with comprehensive evaluation on multiple bias types and models.

Contribution

It introduces a novel, configurable pipeline combining LLM-generated labels, sociolinguistic filtering, and counterfactual augmentation for bias mitigation in textual datasets.

Findings

01

Successfully reduces representation bias and stereotypes in datasets.

02

Debiased data does not always lead to improved model bias benchmarks.

03

Highlights gaps in current bias evaluation methodologies.

Abstract

Textual data used to train large language models (LLMs) exhibits multifaceted bias manifestations encompassing harmful language and skewed demographic distributions. Regulations such as the European AI Act require identifying and mitigating biases against protected groups in data, with the ultimate goal of preventing unfair model outputs. However, practical guidance and operationalization are lacking. We propose a comprehensive data bias detection and mitigation pipeline comprising four components that address two data bias types, namely representation bias and (explicit) stereotypes for a configurable sensitive attribute. First, we leverage LLM-generated word lists created based on quality criteria to detect relevant group labels. Second, representation bias is quantified using the Demographic Representation Score. Third, we detect and mitigate stereotypes using sociolinguistically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Computational and Text Analysis Methods · Artificial Intelligence in Healthcare and Education