Comparative Analysis of Machine Learning and Deep Learning Algorithms for Detection of Online Hate Speech
Tashvik Dhamija, Anjum, Rahul Katarya

TL;DR
This paper compares machine learning and deep learning algorithms for detecting online hate speech, highlighting that BERT-based embeddings combined with decision trees achieve near-perfect classification accuracy.
Contribution
It demonstrates that BERT-based sentence embeddings significantly improve hate speech detection accuracy over traditional NLP features.
Findings
BERT embeddings outperform traditional features in hate speech detection
Robustly optimized BERT (roBERTa) with decision trees achieves 0.9998 F1 score
Feature engineering with advanced embeddings enhances model robustness
Abstract
In the day and age of social media, users have become prone to online hate speech. Several attempts have been made to classify hate speech using machine learning but the state-of-the-art models are not robust enough for practical applications. This is attributed to the use of primitive NLP feature engineering techniques. In this paper, we explored various feature engineering techniques ranging from different embeddings to conventional NLP algorithms. We also experimented with combinations of different features. From our experimentation, we realized that roBERTa (robustly optimized BERT approach) based sentence embeddings classified using decision trees gives the best results of 0.9998 F1 score. In our paper, we concluded that BERT based embeddings give the most useful features for this problem and have the capacity to be made into a practical robust model.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Internet Traffic Analysis and Secure E-voting
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Dropout · Multi-Head Attention · Layer Normalization · Dense Connections · Linear Warmup With Linear Decay · Residual Connection · Softmax
