ToxicTone: A Mandarin Audio Dataset Annotated for Toxicity and Toxic Utterance Tonality

Yu-Xiang Luo; Yi-Cheng Lin; Ming-To Chuang; Jia-Hung Chen; I-Ning Tsai; Pei Xing Kiew; Yueh-Hsuan Huang; Chien-Feng Liu; Yu-Chen Chen; Bo-Han Feng; Wenze Ren; Hung-yi Lee

arXiv:2505.15773·eess.AS·May 22, 2025

ToxicTone: A Mandarin Audio Dataset Annotated for Toxicity and Toxic Utterance Tonality

Yu-Xiang Luo, Yi-Cheng Lin, Ming-To Chuang, Jia-Hung Chen, I-Ning Tsai, Pei Xing Kiew, Yueh-Hsuan Huang, Chien-Feng Liu, Yu-Chen Chen, Bo-Han Feng, Wenze Ren, Hung-yi Lee

PDF

Open Access 1 Repo

TL;DR

ToxicTone is a comprehensive Mandarin audio dataset with detailed toxicity annotations, enabling improved detection of toxic speech by leveraging acoustic, linguistic, and emotional cues in spoken language.

Contribution

We created the largest annotated Mandarin toxicity dataset and proposed a multimodal detection framework that outperforms text-only models.

Findings

01

Our model achieves higher accuracy than baseline models.

02

Speech cues significantly improve toxicity detection.

03

The dataset captures diverse real-world communication scenarios.

Abstract

Despite extensive research on toxic speech detection in text, a critical gap remains in handling spoken Mandarin audio. The lack of annotated datasets that capture the unique prosodic cues and culturally specific expressions in Mandarin leaves spoken toxicity underexplored. To address this, we introduce ToxicTone -- the largest public dataset of its kind -- featuring detailed annotations that distinguish both forms of toxicity (e.g., profanity, bullying) and sources of toxicity (e.g., anger, sarcasm, dismissiveness). Our data, sourced from diverse real-world audio and organized into 13 topical categories, mirrors authentic communication scenarios. We also propose a multimodal detection framework that integrates acoustic, linguistic, and emotional features using state-of-the-art speech and emotion encoders. Extensive experiments show our approach outperforms text-only and baseline…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

YuXiangLo/ToxicTone
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing