Mitigating Racial Biases in Toxic Language Detection with an   Equity-Based Ensemble Framework

Matan Halevy; Camille Harris; Amy Bruckman; Diyi Yang; Ayanna Howard

arXiv:2109.13137·cs.CL·September 28, 2021

Mitigating Racial Biases in Toxic Language Detection with an Equity-Based Ensemble Framework

Matan Halevy, Camille Harris, Amy Bruckman, Diyi Yang, Ayanna Howard

PDF

1 Repo

TL;DR

This paper introduces an ensemble framework that reduces racial biases in toxic language detection, especially against African American English, by using specialized classifiers and fairness metrics, with minimal impact on accuracy.

Contribution

The paper presents a novel ensemble approach that effectively mitigates racial biases in toxic language classifiers, addressing limitations of previous single-model bias-remediation methods.

Findings

01

Significant reduction in racial bias metrics across datasets.

02

Ensemble framework maintains high classification performance.

03

Demonstrates unlearning of annotation biases related to African American English.

Abstract

Recent research has demonstrated how racial biases against users who write African American English exists in popular toxic language datasets. While previous work has focused on a single fairness criteria, we propose to use additional descriptive fairness metrics to better understand the source of these biases. We demonstrate that different benchmark classifiers, as well as two in-process bias-remediation techniques, propagate racial biases even in a larger corpus. We then propose a novel ensemble-framework that uses a specialized classifier that is fine-tuned to the African American English dialect. We show that our proposed framework substantially reduces the racial biases that the model learns from these datasets. We demonstrate how the ensemble framework improves fairness metrics across all sample datasets with minimal impact on the classification performance, and provide empirical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

matanhalevy/DebiasingHateDetectionAAE
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Methods7 Fastest Ways to Call American Airlines Reservations Number (USA Guide)