Noisy Self-Training with Data Augmentations for Offensive and Hate Speech Detection Tasks
Jo\~ao A. Leite, Carolina Scarton, Diego F. Silva

TL;DR
This paper investigates the effectiveness of self-training and noisy self-training with data augmentation techniques for offensive and hate speech detection, finding that while self-training improves performance, noisy approaches may decrease it.
Contribution
It provides a comprehensive evaluation of default and noisy self-training methods with various data augmentations across multiple BERT models for hate speech detection.
Findings
Self-training improves F1-macro scores by up to 1.5%.
Noisy self-training with augmentations decreases performance.
Performance gains are consistent across different model sizes.
Abstract
Online social media is rife with offensive and hateful comments, prompting the need for their automatic detection given the sheer amount of posts created every second. Creating high-quality human-labelled datasets for this task is difficult and costly, especially because non-offensive posts are significantly more frequent than offensive ones. However, unlabelled data is abundant, easier, and cheaper to obtain. In this scenario, self-training methods, using weakly-labelled examples to increase the amount of training data, can be employed. Recent "noisy" self-training approaches incorporate data augmentation techniques to ensure prediction consistency and increase robustness against noisy data and adversarial attacks. In this paper, we experiment with default and noisy self-training using three different textual data augmentation techniques across five different pre-trained BERT…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Adversarial Robustness in Machine Learning
MethodsMulti-Head Attention · Attention Is All You Need · Refunds@Expedia|||How do I get a full refund from Expedia? · Layer Normalization · Linear Layer · Dropout · WordPiece · Adam · Attention Dropout · Linear Warmup With Linear Decay
