Hate Speech Detection in Clubhouse

Hadi Mansourifar; Dana Alsagheer; Reza Fathi; Weidong Shi; Lan Ni; Yan; Huang

arXiv:2106.13238·cs.LG·July 13, 2021

Hate Speech Detection in Clubhouse

Hadi Mansourifar, Dana Alsagheer, Reza Fathi, Weidong Shi, Lan Ni, Yan, Huang

PDF

Open Access

TL;DR

This paper introduces the first dataset for hate speech detection in voice chat rooms like Clubhouse, analyzing its characteristics and demonstrating that Perspective Scores outperform traditional text features in detection tasks.

Contribution

It presents the first dataset from Clubhouse for hate speech detection and evaluates high-level features like Perspective Scores for improved accuracy.

Findings

01

Perspective Scores outperform Bag of Words and Word2Vec

02

First dataset collection from Clubhouse for hate speech detection

03

Analysis of hate speech instances using statistical methods

Abstract

With the rise of voice chat rooms, a gigantic resource of data can be exposed to the research community for natural language processing tasks. Moderators in voice chat rooms actively monitor the discussions and remove the participants with offensive language. However, it makes the hate speech detection even more difficult since some participants try to find creative ways to articulate hate speech. This makes the hate speech detection challenging in new social media like Clubhouse. To the best of our knowledge all the hate speech datasets have been collected from text resources like Twitter. In this paper, we take the first step to collect a significant dataset from Clubhouse as the rising star in social media industry. We analyze the collected instances from statistical point of view using the Google Perspective Scores. Our experiments show that, the Perspective Scores can outperform…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Spam and Phishing Detection · Internet Traffic Analysis and Secure E-voting