On the Role of Speech Data in Reducing Toxicity Detection Bias

Samuel J. Bell; Mariano Coria Meglioli; Megan Richards; Eduardo S\'anchez; Christophe Ropers; Skyler Wang; Adina Williams; Levent Sagun; Marta R. Costa-juss\`a

arXiv:2411.08135·cs.CL·May 19, 2025

On the Role of Speech Data in Reducing Toxicity Detection Bias

Samuel J. Bell, Mariano Coria Meglioli, Megan Richards, Eduardo S\'anchez, Christophe Ropers, Skyler Wang, Adina Williams, Levent Sagun, Marta R. Costa-juss\`a

PDF

Open Access 1 Video

TL;DR

This paper investigates how speech-based toxicity detection systems can reduce bias compared to text-based systems, highlighting the importance of classifier improvements and providing new annotated datasets for future research.

Contribution

It introduces high-quality group annotations for the MuTox dataset and systematically compares speech- and text-based toxicity classifiers to assess bias reduction.

Findings

01

Speech data reduces bias against group mentions.

02

Classifier improvements are more effective than transcription pipeline enhancements.

03

Annotated datasets and recommendations are publicly released.

Abstract

Text toxicity detection systems exhibit significant biases, producing disproportionate rates of false positives on samples mentioning demographic groups. But what about toxicity detection in speech? To investigate the extent to which text-based biases are mitigated by speech-based systems, we produce a set of high-quality group annotations for the multilingual MuTox dataset, and then leverage these annotations to systematically compare speech- and text-based toxicity classifiers. Our findings indicate that access to speech data during inference supports reduced bias against group mentions, particularly for ambiguous and disagreement-inducing samples. Our results also suggest that improving classifiers, rather than transcription pipelines, is more helpful for reducing group bias. We publicly release our annotations and provide recommendations for future toxicity dataset construction.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

On the Role of Speech Data in Reducing Toxicity Detection Bias· underline

Taxonomy

TopicsSpeech Recognition and Synthesis · Employee Welfare and Language Studies · Natural Language Processing Techniques

MethodsSparse Evolutionary Training