Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based   Bias in NLP

Timo Schick; Sahana Udupa; Hinrich Sch\"utze

arXiv:2103.00453·cs.CL·September 10, 2021·1 cites

Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP

Timo Schick, Sahana Udupa, Hinrich Sch\"utze

PDF

Open Access 3 Repos

TL;DR

This paper introduces a novel self-diagnosis and self-debiasing method allowing language models to recognize and reduce their own biases during text generation without additional training or curated lists.

Contribution

The paper presents a new decoding algorithm enabling language models to self-diagnose and self-debias biases based on textual descriptions, without modifying model parameters or requiring extra training.

Findings

01

Pretrained models can recognize their own biases.

02

Self-debiasing reduces problematic outputs during generation.

03

Approach does not need curated lists or retraining.

Abstract

When trained on large, unfiltered crawls from the internet, language models pick up and reproduce all kinds of undesirable biases that can be found in the data: they often generate racist, sexist, violent or otherwise toxic language. As large models require millions of training examples to achieve good performance, it is difficult to completely prevent them from being exposed to such content. In this paper, we first demonstrate a surprising finding: pretrained language models recognize, to a considerable degree, their undesirable biases and the toxicity of the content they produce. We refer to this capability as self-diagnosis. Based on this finding, we then propose a decoding algorithm that, given only a textual description of the undesired behavior, reduces the probability of a language model producing problematic text. We refer to this approach as self-debiasing. Self-debiasing does…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Natural Language Processing Techniques · Topic Modeling