An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models
Nicholas Meade, Elinor Poole-Dayan, Siva Reddy

TL;DR
This paper empirically evaluates five bias mitigation techniques for pre-trained language models, analyzing their effectiveness across bias benchmarks and their impact on language modeling and downstream tasks.
Contribution
It provides a comprehensive comparison of recent debiasing methods, highlighting the most effective technique and discussing trade-offs between bias reduction and model performance.
Findings
Self-Debias outperforms other techniques on bias benchmarks
Debiasing methods are less effective for non-gender biases
Bias mitigation often reduces language modeling ability
Abstract
Recent work has shown pre-trained language models capture social biases from the large amounts of text they are trained on. This has attracted attention to developing techniques that mitigate such biases. In this work, we perform an empirical survey of five recently proposed bias mitigation techniques: Counterfactual Data Augmentation (CDA), Dropout, Iterative Nullspace Projection, Self-Debias, and SentenceDebias. We quantify the effectiveness of each technique using three intrinsic bias benchmarks while also measuring the impact of these techniques on a model's language modeling ability, as well as its performance on downstream NLU tasks. We experimentally find that: (1) Self-Debias is the strongest debiasing technique, obtaining improved scores on all bias benchmarks; (2) Current debiasing techniques perform less consistently when mitigating non-gender biases; And (3) improvements on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsDropout
