A study of text representations in Hate Speech Detection

Chrysoula Themeli; George Giannakopoulos; Nikiforos Pittaras

arXiv:2102.04521·cs.CL·February 10, 2021·1 cites

A study of text representations in Hate Speech Detection

Chrysoula Themeli, George Giannakopoulos, Nikiforos Pittaras

PDF

Open Access 1 Repo

TL;DR

This paper evaluates various text representation methods for hate speech detection on social media, finding that simple keyword frequency features combined with classifiers yield the best results.

Contribution

It systematically compares multiple text representations and classifiers, highlighting the effectiveness of simple features like BoW and N-gram graphs for hate speech detection.

Findings

01

BoW features outperform other representations.

02

Pre-trained embeddings like GLoVe are effective.

03

Combining representations with Logistic Regression yields top performance.

Abstract

The pervasiveness of the Internet and social media have enabled the rapid and anonymous spread of Hate Speech content on microblogging platforms such as Twitter. Current EU and US legislation against hateful language, in conjunction with the large amount of data produced in these platforms has led to automatic tools being a necessary component of the Hate Speech detection task and pipeline. In this study, we examine the performance of several, diverse text representation techniques paired with multiple classification algorithms, on the automatic Hate Speech detection and abusive language discrimination task. We perform an experimental evaluation on binary and multiclass datasets, paired with significance testing. Our results show that simple hate-keyword frequency features (BoW) work best, followed by pre-trained word embeddings (GLoVe) as well as N-gram graphs (NGGs): a graph-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cthem/hate-speech-detection
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Internet Traffic Analysis and Secure E-voting

MethodsLogistic Regression