Comparative Analysis of Machine Learning and Deep Learning Algorithms   for Detection of Online Hate Speech

Tashvik Dhamija; Anjum; Rahul Katarya

arXiv:2108.01063·cs.CL·August 3, 2021

Comparative Analysis of Machine Learning and Deep Learning Algorithms for Detection of Online Hate Speech

Tashvik Dhamija, Anjum, Rahul Katarya

PDF

Open Access

TL;DR

This paper compares machine learning and deep learning algorithms for detecting online hate speech, highlighting that BERT-based embeddings combined with decision trees achieve near-perfect classification accuracy.

Contribution

It demonstrates that BERT-based sentence embeddings significantly improve hate speech detection accuracy over traditional NLP features.

Findings

01

BERT embeddings outperform traditional features in hate speech detection

02

Robustly optimized BERT (roBERTa) with decision trees achieves 0.9998 F1 score

03

Feature engineering with advanced embeddings enhances model robustness

Abstract

In the day and age of social media, users have become prone to online hate speech. Several attempts have been made to classify hate speech using machine learning but the state-of-the-art models are not robust enough for practical applications. This is attributed to the use of primitive NLP feature engineering techniques. In this paper, we explored various feature engineering techniques ranging from different embeddings to conventional NLP algorithms. We also experimented with combinations of different features. From our experimentation, we realized that roBERTa (robustly optimized BERT approach) based sentence embeddings classified using decision trees gives the best results of 0.9998 F1 score. In our paper, we concluded that BERT based embeddings give the most useful features for this problem and have the capacity to be made into a practical robust model.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Internet Traffic Analysis and Secure E-voting

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Dropout · Multi-Head Attention · Layer Normalization · Dense Connections · Linear Warmup With Linear Decay · Residual Connection · Softmax