Improving Automatic Hate Speech Detection with Multiword Expression   Features

Nicolas Zampieri; Irina Illina; Dominique Fohr

arXiv:2106.00237·cs.CL·June 2, 2021·1 cites

Improving Automatic Hate Speech Detection with Multiword Expression Features

Nicolas Zampieri, Irina Illina, Dominique Fohr

PDF

Open Access

TL;DR

This paper introduces multiword expression features into deep neural networks for hate speech detection, significantly improving performance on social media tweet datasets.

Contribution

It is the first to incorporate MWE features into hate speech detection models, enhancing their accuracy with a novel multi-branch neural network architecture.

Findings

01

MWE features improve macro-F1 scores significantly.

02

BERT-based MWE embeddings outperform word2vec.

03

Multi-branch neural network effectively integrates MWE information.

Abstract

The task of automatically detecting hate speech in social media is gaining more and more attention. Given the enormous volume of content posted daily, human monitoring of hate speech is unfeasible. In this work, we propose new word-level features for automatic hate speech detection (HSD): multiword expressions (MWEs). MWEs are lexical units greater than a word that have idiomatic and compositional meanings. We propose to integrate MWE features in a deep neural network-based HSD framework. Our baseline HSD system relies on Universal Sentence Encoder (USE). To incorporate MWE features, we create a three-branch deep neural network: one branch for USE, one for MWE categories, and one for MWE embeddings. We conduct experiments on two hate speech tweet corpora with different MWE categories and with two types of MWE embeddings, word2vec and BERT. Our experiments demonstrate that the proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Internet Traffic Analysis and Secure E-voting

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay · Layer Normalization · Residual Connection · WordPiece · Attention Dropout · Dense Connections