Debiasing Methods in Natural Language Understanding Make Bias More   Accessible

Michael Mendelson; Yonatan Belinkov

arXiv:2109.04095·cs.CL·September 10, 2021

Debiasing Methods in Natural Language Understanding Make Bias More Accessible

Michael Mendelson, Yonatan Belinkov

PDF

Open Access 2 Repos

TL;DR

This paper introduces a probing framework to interpret biases in language models and finds that debiasing efforts may inadvertently increase bias encoding within model representations.

Contribution

It presents a novel information-theoretic probing method to analyze biases in language models and reveals that debiasing can make biases more accessible in internal representations.

Findings

01

Debiasing can increase bias encoding in model representations.

02

Proposed a probing-based framework for bias interpretation.

03

Counter-intuitive result that debiased models may encode more bias.

Abstract

Model robustness to bias is often determined by the generalization on carefully designed out-of-distribution datasets. Recent debiasing methods in natural language understanding (NLU) improve performance on such datasets by pressuring models into making unbiased predictions. An underlying assumption behind such methods is that this also leads to the discovery of more robust features in the model's inner representations. We propose a general probing-based framework that allows for post-hoc interpretation of biases in language models, and use an information-theoretic approach to measure the extractability of certain biases from the model's representations. We experiment with several NLU datasets and known biases, and show that, counter-intuitively, the more a language model is pushed towards a debiased regime, the more bias is actually encoded in its inner representations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)