HaT5: Hate Language Identification using Text-to-Text Transfer Transformer
Sana Sabah Sabry, Tosin Adewumi, Nosheen Abid, Gy\"orgy Kovacs,, Foteini Liwicki, Marcus Liwicki

TL;DR
This paper evaluates the T5 model's effectiveness in hate speech detection across diverse datasets, enhances performance with data augmentation, and emphasizes explainability and dataset quality issues.
Contribution
It introduces a novel implementation method for T5, uses a new data augmentation technique, and analyzes dataset annotation shortcomings with explainability tools.
Findings
T5 achieved near-SoTA results on hate speech datasets.
Data augmentation improved model performance.
Identified annotation issues in HASOC 2021 dataset.
Abstract
We investigate the performance of a state-of-the art (SoTA) architecture T5 (available on the SuperGLUE) and compare with it 3 other previous SoTA architectures across 5 different tasks from 2 relatively diverse datasets. The datasets are diverse in terms of the number and types of tasks they have. To improve performance, we augment the training data by using an autoregressive model. We achieve near-SoTA results on a couple of the tasks - macro F1 scores of 81.66% for task A of the OLID 2019 dataset and 82.54% for task A of the hate speech and offensive content (HASOC) 2021 dataset, where SoTA are 82.9% and 83.05%, respectively. We perform error analysis and explain why one of the models (Bi-LSTM) makes the predictions it does by using a publicly available algorithm: Integrated Gradient (IG). This is because explainable artificial intelligence (XAI) is essential for earning the trust of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
MethodsAttention Is All You Need · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Adafactor · Inverse Square Root Schedule · Softmax · Residual Connection · Layer Normalization · Dense Connections · Byte Pair Encoding
