Does Debiasing Inevitably Degrade the Model Performance

Yiran Liu; Xiao Liu; Haotian Chen; Yang Yu

arXiv:2211.07350·cs.CL·June 13, 2023

Does Debiasing Inevitably Degrade the Model Performance

Yiran Liu, Xiao Liu, Haotian Chen, Yang Yu

PDF

Open Access

TL;DR

This paper presents a theoretical framework to understand gender bias in language models, explains why debiasing often degrades performance, and introduces a causality-based fine-tuning method that reduces bias without performance loss.

Contribution

The authors develop a theoretical explanation for bias mechanisms, identify when debiasing does not harm performance, and propose a causality-driven fine-tuning approach.

Findings

01

Theoretical framework clarifies bias mechanisms.

02

Debiasing can be achieved without performance degradation.

03

Causality-based fine-tuning mitigates bias while preserving performance.

Abstract

Gender bias in language models has attracted sufficient attention because it threatens social justice. However, most of the current debiasing methods degraded the model's performance on other tasks while the degradation mechanism is still mysterious. We propose a theoretical framework explaining the three candidate mechanisms of the language model's gender bias. We use our theoretical framework to explain why the current debiasing methods cause performance degradation. We also discover a pathway through which debiasing will not degrade the model performance. We further develop a causality-detection fine-tuning approach to correct gender bias. The numerical experiment demonstrates that our method is able to lead to double dividends: partially mitigating gender bias while avoiding performance degradation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning