An Investigation Into Explainable Audio Hate Speech Detection

Jinmyeong An; Wonjun Lee; Yejin Jeon; Jungseul Ok; Yunsu Kim; Gary; Geunbae Lee

arXiv:2408.06065·cs.CL·August 13, 2024

An Investigation Into Explainable Audio Hate Speech Detection

Jinmyeong An, Wonjun Lee, Yejin Jeon, Jungseul Ok, Yunsu Kim, Gary, Geunbae Lee

PDF

Open Access

TL;DR

This paper introduces the task of explainable audio hate speech detection, proposing two methods, creating a synthetic dataset, and demonstrating that end-to-end models with rationales improve detection accuracy.

Contribution

It presents the first approaches for explainable audio hate speech detection, including a synthetic dataset and two model architectures, with the E2E method showing superior performance.

Findings

01

E2E approach outperforms cascading in IoU metric

02

Including frame-level rationales improves detection accuracy

03

Synthetic dataset enables training of explainable models

Abstract

Research on hate speech has predominantly revolved around detection and interpretation from textual inputs, leaving verbal content largely unexplored. While there has been limited exploration into hate speech detection within verbal acoustic speech inputs, the aspect of interpretability has been overlooked. Therefore, we introduce a new task of explainable audio hate speech detection. Specifically, we aim to identify the precise time intervals, referred to as audio frame-level rationales, which serve as evidence for hate speech classification. Towards this end, we propose two different approaches: cascading and End-to-End (E2E). The cascading approach initially converts audio to transcripts, identifies hate speech within these transcripts, and subsequently locates the corresponding audio time frames. Conversely, the E2E approach processes audio utterances directly, which allows it to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing