Racial Disparity in Natural Language Processing: A Case Study of Social Media African-American English
Su Lin Blodgett, Brendan O'Connor

TL;DR
This paper investigates racial disparities in NLP, specifically analyzing how language identification algorithms perform differently on African-American English tweets compared to other dialects, highlighting fairness issues.
Contribution
It provides an empirical case study on racial disparity in NLP, focusing on African-American English and its impact on language identification accuracy.
Findings
NLP algorithms perform worse on African-American English tweets.
Disparities in language identification can affect social media analysis.
Implications for fairness in NLP systems are discussed.
Abstract
We highlight an important frontier in algorithmic fairness: disparity in the quality of natural language processing algorithms when applied to language from authors of different social groups. For example, current systems sometimes analyze the language of females and minorities more poorly than they do of whites and males. We conduct an empirical analysis of racial disparity in language identification for tweets written in African-American English, and discuss implications of disparity in NLP.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Authorship Attribution and Profiling · Names, Identity, and Discrimination Research
