Reducing Unintended Identity Bias in Russian Hate Speech Detection

Nadezhda Zueva; Madina Kabirova; Pavel Kalaidin

arXiv:2010.11666·cs.CL·October 23, 2020

Reducing Unintended Identity Bias in Russian Hate Speech Detection

Nadezhda Zueva, Madina Kabirova, Pavel Kalaidin

PDF

TL;DR

This paper addresses bias in Russian hate speech detection models by proposing techniques like data augmentation with language models and word dropout to reduce unintended identity bias.

Contribution

It introduces simple, effective methods for mitigating bias in hate speech classifiers for Russian, enhancing fairness without complex model changes.

Findings

01

Bias reduction techniques decreased false positives related to identity words.

02

Methods improved fairness metrics in hate speech detection.

03

Approach is applicable to other languages and bias types.

Abstract

Toxicity has become a grave problem for many online communities and has been growing across many languages, including Russian. Hate speech creates an environment of intimidation, discrimination, and may even incite some real-world violence. Both researchers and social platforms have been focused on developing models to detect toxicity in online communication for a while now. A common problem of these models is the presence of bias towards some words (e.g. woman, black, jew) that are not toxic, but serve as triggers for the classifier due to model caveats. In this paper, we describe our efforts towards classifying hate speech in Russian, and propose simple techniques of reducing unintended bias, such as generating training data with language models using terms and words related to protected identities as context and applying word dropout to such words.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsDropout