dictNN: A Dictionary-Enhanced CNN Approach for Classifying Hate Speech   on Twitter

Maximilian Kupi; Michael Bodnar; Nikolas Schmidt; and Carlos Eduardo; Posada

arXiv:2103.08780·cs.CL·March 17, 2021

dictNN: A Dictionary-Enhanced CNN Approach for Classifying Hate Speech on Twitter

Maximilian Kupi, Michael Bodnar, Nikolas Schmidt, and Carlos Eduardo, Posada

PDF

Open Access 1 Repo

TL;DR

This paper introduces dictNN, a CNN-based model enhanced with a crowd-sourced hate word dictionary, significantly improving hate speech detection accuracy on Twitter by leveraging continuous updates and fusion with standard embeddings.

Contribution

The paper presents a novel dictionary-enhanced vectorization method combined with CNNs for more reliable hate speech classification on social media.

Findings

01

F1 macro score increased by 7 percentage points with dictionary enhancement.

02

Model trained on a merged dataset of over 110,000 tweets.

03

Dictionary-based approach improves detection performance.

Abstract

Hate speech on social media is a growing concern, and automated methods have so far been sub-par at reliably detecting it. A major challenge lies in the potentially evasive nature of hate speech due to the ambiguity and fast evolution of natural language. To tackle this, we introduce a vectorisation based on a crowd-sourced and continuously updated dictionary of hate words and propose fusing this approach with standard word embedding in order to improve the classification performance of a CNN model. To train and test our model we use a merge of two established datasets (110,748 tweets in total). By adding the dictionary-enhanced input, we are able to increase the CNN model's predictive power and increase the F1 macro score by seven percentage points.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MaximilianKupi/nlp-project
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection