Advancing Hate Speech Detection with Transformers: Insights from the MetaHate
Santosh Chapagain, Shah Muhammad Hamdi, Soukaina Filali Boubrahimi

TL;DR
This paper explores transformer-based models for hate speech detection on a large, diverse social media dataset, demonstrating that ELECTRA outperforms other models with high accuracy and analyzing key challenges like sarcasm and coded language.
Contribution
It provides a comprehensive evaluation of multiple transformer models on the large MetaHate dataset, highlighting ELECTRA's superior performance and analyzing common classification errors.
Findings
ELECTRA achieved the highest F1 score of 0.8980.
Transformer models outperform traditional RNNs, LSTMs, and CNNs.
Challenges include detecting sarcasm, coded language, and handling label noise.
Abstract
Hate speech is a widespread and harmful form of online discourse, encompassing slurs and defamatory posts that can have serious social, psychological, and sometimes physical impacts on targeted individuals and communities. As social media platforms such as X (formerly Twitter), Facebook, Instagram, Reddit, and others continue to facilitate widespread communication, they also become breeding grounds for hate speech, which has increasingly been linked to real-world hate crimes. Addressing this issue requires the development of robust automated methods to detect hate speech in diverse social media environments. Deep learning approaches, such as vanilla recurrent neural networks (RNNs), long short-term memory (LSTM), and convolutional neural networks (CNNs), have achieved good results, but are often limited by issues such as long-term dependencies and inefficient parallelization. This study…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
