Gender Bias in Explainability: Investigating Performance Disparity in   Post-hoc Methods

Mahdi Dhaini; Ege Erdogan; Nils Feldhus; Gjergji Kasneci

arXiv:2505.01198·cs.CL·May 5, 2025

Gender Bias in Explainability: Investigating Performance Disparity in Post-hoc Methods

Mahdi Dhaini, Ege Erdogan, Nils Feldhus, Gjergji Kasneci

PDF

Open Access 1 Repo

TL;DR

This study reveals significant gender disparities in the performance of widely used post-hoc explanation methods across multiple NLP tasks and models, emphasizing the need for fairness considerations in explainability.

Contribution

It demonstrates that explanation disparities exist independently of training data bias, highlighting a critical overlooked aspect of fairness in model interpretability.

Findings

01

Gender disparities in faithfulness, robustness, and complexity of explanations.

02

Disparities persist even with unbiased datasets.

03

Implications for fairness in high-stakes AI applications.

Abstract

While research on applications and evaluations of explanation methods continues to expand, fairness of the explanation methods concerning disparities in their performance across subgroups remains an often overlooked aspect. In this paper, we address this gap by showing that, across three tasks and five language models, widely used post-hoc feature attribution methods exhibit significant gender disparity with respect to their faithfulness, robustness, and complexity. These disparities persist even when the models are pre-trained or fine-tuned on particularly unbiased datasets, indicating that the disparities we observe are not merely consequences of biased training data. Our results highlight the importance of addressing disparities in explanations when developing and applying explainability methods, as these can lead to biased outcomes against certain subgroups, with particularly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dmah10/fairness-explainable-nlp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI)