TL;DR
This paper introduces an ensemble framework that reduces racial biases in toxic language detection, especially against African American English, by using specialized classifiers and fairness metrics, with minimal impact on accuracy.
Contribution
The paper presents a novel ensemble approach that effectively mitigates racial biases in toxic language classifiers, addressing limitations of previous single-model bias-remediation methods.
Findings
Significant reduction in racial bias metrics across datasets.
Ensemble framework maintains high classification performance.
Demonstrates unlearning of annotation biases related to African American English.
Abstract
Recent research has demonstrated how racial biases against users who write African American English exists in popular toxic language datasets. While previous work has focused on a single fairness criteria, we propose to use additional descriptive fairness metrics to better understand the source of these biases. We demonstrate that different benchmark classifiers, as well as two in-process bias-remediation techniques, propagate racial biases even in a larger corpus. We then propose a novel ensemble-framework that uses a specialized classifier that is fine-tuned to the African American English dialect. We show that our proposed framework substantially reduces the racial biases that the model learns from these datasets. We demonstrate how the ensemble framework improves fairness metrics across all sample datasets with minimal impact on the classification performance, and provide empirical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methods7 Fastest Ways to Call American Airlines Reservations Number (USA Guide)
