Predictive Biases in Natural Language Processing Models: A Conceptual   Framework and Overview

Deven Shah; H. Andrew Schwartz; Dirk Hovy

arXiv:1912.11078·cs.CL·September 15, 2020

Predictive Biases in Natural Language Processing Models: A Conceptual Framework and Overview

Deven Shah, H. Andrew Schwartz, Dirk Hovy

PDF

TL;DR

This paper introduces a unifying conceptual framework for understanding and categorizing predictive biases in NLP models, aiming to organize existing research and guide future efforts.

Contribution

It proposes a general mathematical definition of predictive bias and identifies four main origins, unifying diverse bias mitigation approaches within a single framework.

Findings

01

Summarizes NLP bias literature

02

Defines four bias origins: label, selection, overamplification, semantic

03

Guides future bias research in NLP

Abstract

An increasing number of works in natural language processing have addressed the effect of bias on the predicted outcomes, introducing mitigation techniques that act on different parts of the standard NLP pipeline (data and models). However, these works have been conducted in isolation, without a unifying framework to organize efforts within the field. This leads to repetitive approaches, and puts an undue focus on the effects of bias, rather than on their origins. Research focused on bias symptoms rather than the underlying origins could limit the development of effective countermeasures. In this paper, we propose a unifying conceptualization: the predictive bias framework for NLP. We summarize the NLP literature and propose a general mathematical definition of predictive bias in NLP along with a conceptual framework, differentiating four main origins of biases: label bias, selection…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.