# Racial Bias in Hate Speech and Abusive Language Detection Datasets

**Authors:** Thomas Davidson, Debasmita Bhattacharya, Ingmar Weber

arXiv: 1905.12516 · 2019-05-30

## TL;DR

This paper investigates racial bias in hate speech datasets and classifiers, revealing systematic biases that could unfairly target African-American English speakers and impact social media users.

## Contribution

It provides an empirical analysis of racial bias across multiple datasets and highlights the potential for discriminatory impacts in abusive language detection systems.

## Key findings

- All datasets show racial bias favoring African-American English.
- Classifiers predict higher abuse rates for African-American English tweets.
- Bias could lead to disproportionate negative impacts on African-American users.

## Abstract

Technologies for abusive language detection are being developed and applied with little consideration of their potential biases. We examine racial bias in five different sets of Twitter data annotated for hate speech and abusive language. We train classifiers on these datasets and compare the predictions of these classifiers on tweets written in African-American English with those written in Standard American English. The results show evidence of systematic racial bias in all datasets, as classifiers trained on them tend to predict that tweets written in African-American English are abusive at substantially higher rates. If these abusive language detection systems are used in the field they will therefore have a disproportionate negative impact on African-American social media users. Consequently, these systems may discriminate against the groups who are often the targets of the abuse we are trying to detect.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.12516/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1905.12516/full.md

## References

31 references — full list in the complete paper: https://tomesphere.com/paper/1905.12516/full.md

---
Source: https://tomesphere.com/paper/1905.12516