Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems

Svetlana Kiritchenko; Saif M. Mohammad

arXiv:1805.04508·cs.CL·May 14, 2018·31 cites

Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems

Svetlana Kiritchenko, Saif M. Mohammad

PDF

Open Access

TL;DR

This paper introduces the Equity Evaluation Corpus (EEC) to assess racial and gender biases in 219 sentiment analysis systems, revealing significant biases in many systems and providing a new benchmark for bias evaluation.

Contribution

The paper presents the first benchmark dataset, EEC, for systematically evaluating racial and gender biases in sentiment analysis systems.

Findings

01

Many systems exhibit statistically significant bias towards certain races and genders.

02

Biases manifest as higher sentiment scores for specific groups.

03

The EEC dataset is publicly available for future bias assessments.

Abstract

Automatic machine learning systems can inadvertently accentuate and perpetuate inappropriate human biases. Past work on examining inappropriate biases has largely focused on just individual systems. Further, there is no benchmark dataset for examining inappropriate biases in systems. Here for the first time, we present the Equity Evaluation Corpus (EEC), which consists of 8,640 English sentences carefully chosen to tease out biases towards certain races and genders. We use the dataset to examine 219 automatic sentiment analysis systems that took part in a recent shared task, SemEval-2018 Task 1 'Affect in Tweets'. We find that several of the systems show statistically significant bias; that is, they consistently provide slightly higher sentiment intensity predictions for one race or one gender. We make the EEC freely available.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Sentiment Analysis and Opinion Mining · Misinformation and Its Impacts