Detecting Hate Speech in Social Media
Shervin Malmasi, Marcos Zampieri

TL;DR
This paper explores supervised classification techniques to detect hate speech in social media, distinguishing it from profanity, and establishes lexical baselines with promising accuracy, highlighting key challenges and future directions.
Contribution
It introduces a lexical baseline approach using character and word n-grams for hate speech detection and analyzes the main challenges in differentiating hate speech from profanity.
Findings
78% accuracy in classifying hate speech, profanity, and neutral posts
Main challenge is distinguishing hate speech from general profanity
Provides directions for improving hate speech detection methods
Abstract
In this paper we examine methods to detect hate speech in social media, while distinguishing this from general profanity. We aim to establish lexical baselines for this task by applying supervised classification methods using a recently released dataset annotated for this purpose. As features, our system uses character n-grams, word n-grams and word skip-grams. We obtain results of 78% accuracy in identifying posts across three classes. Results demonstrate that the main challenge lies in discriminating profanity and hate speech from each other. A number of directions for future work are discussed.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Swearing, Euphemism, Multilingualism
