Unsupervised Domain Adaptation for Hate Speech Detection Using a Data Augmentation Approach
Sheikh Muhammad Sarwar, Vanessa Murdock

TL;DR
This paper introduces an unsupervised domain adaptation method that enhances hate speech detection models by augmenting data, significantly improving recall and precision across multiple models and datasets.
Contribution
It presents a novel data augmentation approach for unsupervised domain adaptation in hate speech detection, addressing vocabulary and disguise challenges.
Findings
Improved AUPRC by up to 42%
Recall increased by up to 278%
No loss in precision, sometimes improved
Abstract
Online harassment in the form of hate speech has been on the rise in recent years. Addressing the issue requires a combination of content moderation by people, aided by automatic detection methods. As content moderation is itself harmful to the people doing it, we desire to reduce the burden by improving the automatic detection of hate speech. Hate speech presents a challenge as it is directed at different target groups using a completely different vocabulary. Further the authors of the hate speech are incentivized to disguise their behavior to avoid being removed from a platform. This makes it difficult to develop a comprehensive data set for training and evaluating hate speech detection models because the examples that represent one hate speech domain do not typically represent others, even within the same language or culture. We propose an unsupervised domain adaptation approach to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Internet Traffic Analysis and Secure E-voting
