Hate Speech Detection in Clubhouse
Hadi Mansourifar, Dana Alsagheer, Reza Fathi, Weidong Shi, Lan Ni, Yan, Huang

TL;DR
This paper introduces the first dataset for hate speech detection in voice chat rooms like Clubhouse, analyzing its characteristics and demonstrating that Perspective Scores outperform traditional text features in detection tasks.
Contribution
It presents the first dataset from Clubhouse for hate speech detection and evaluates high-level features like Perspective Scores for improved accuracy.
Findings
Perspective Scores outperform Bag of Words and Word2Vec
First dataset collection from Clubhouse for hate speech detection
Analysis of hate speech instances using statistical methods
Abstract
With the rise of voice chat rooms, a gigantic resource of data can be exposed to the research community for natural language processing tasks. Moderators in voice chat rooms actively monitor the discussions and remove the participants with offensive language. However, it makes the hate speech detection even more difficult since some participants try to find creative ways to articulate hate speech. This makes the hate speech detection challenging in new social media like Clubhouse. To the best of our knowledge all the hate speech datasets have been collected from text resources like Twitter. In this paper, we take the first step to collect a significant dataset from Clubhouse as the rising star in social media industry. We analyze the collected instances from statistical point of view using the Google Perspective Scores. Our experiments show that, the Perspective Scores can outperform…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Spam and Phishing Detection · Internet Traffic Analysis and Secure E-voting
