Racial Disparity in Natural Language Processing: A Case Study of Social   Media African-American English

Su Lin Blodgett; Brendan O'Connor

arXiv:1707.00061·cs.CY·July 4, 2017·51 cites

Racial Disparity in Natural Language Processing: A Case Study of Social Media African-American English

Su Lin Blodgett, Brendan O'Connor

PDF

Open Access

TL;DR

This paper investigates racial disparities in NLP, specifically analyzing how language identification algorithms perform differently on African-American English tweets compared to other dialects, highlighting fairness issues.

Contribution

It provides an empirical case study on racial disparity in NLP, focusing on African-American English and its impact on language identification accuracy.

Findings

01

NLP algorithms perform worse on African-American English tweets.

02

Disparities in language identification can affect social media analysis.

03

Implications for fairness in NLP systems are discussed.

Abstract

We highlight an important frontier in algorithmic fairness: disparity in the quality of natural language processing algorithms when applied to language from authors of different social groups. For example, current systems sometimes analyze the language of females and minorities more poorly than they do of whites and males. We conduct an empirical analysis of racial disparity in language identification for tweets written in African-American English, and discuss implications of disparity in NLP.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Authorship Attribution and Profiling · Names, Identity, and Discrimination Research