GECOBench: A Gender-Controlled Text Dataset and Benchmark for Quantifying Biases in Explanations

Rick Wilming; Artur Dox; Hjalmar Schulz; Marta Oliveira; Benedict Clark; Stefan Haufe

arXiv:2406.11547·cs.LG·January 22, 2026

GECOBench: A Gender-Controlled Text Dataset and Benchmark for Quantifying Biases in Explanations

Rick Wilming, Artur Dox, Hjalmar Schulz, Marta Oliveira, Benedict Clark, Stefan Haufe

PDF

1 Repo

TL;DR

This paper introduces GECOBench, a dataset and evaluation framework to quantify gender bias in explanations generated by XAI methods for NLP models, revealing how fine-tuning affects bias mitigation.

Contribution

It presents a novel gender-controlled dataset and benchmark for evaluating explanation bias in language models, and analyzes the impact of fine-tuning on bias reduction in feature attributions.

Findings

01

Fine-tuning reduces explanation bias in XAI methods.

02

Explanation performance improves with more fine-tuned layers.

03

GECOBench enables objective evaluation of bias in explanations.

Abstract

Large pre-trained language models have become a crucial backbone for many downstream tasks in natural language processing (NLP), and while they are trained on a plethora of data containing a variety of biases, such as gender biases, it has been shown that they can also inherit such biases in their weights, potentially affecting their prediction behavior. However, it is unclear to what extent these biases also affect feature attributions generated by applying "explainable artificial intelligence" (XAI) techniques, possibly in unfavorable ways. To systematically study this question, we create a gender-controlled text dataset, GECO, in which the alteration of grammatical gender forms induces class-specific words and provides ground truth feature attributions for gender classification tasks. This enables an objective evaluation of the correctness of XAI methods. We apply this dataset to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

braindatalab/gecobench
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsGeneralized ELBO with Constrained Optimization