A Target-Aware Analysis of Data Augmentation for Hate Speech Detection

Camilla Casula; Sara Tonelli

arXiv:2410.08053·cs.CL·October 11, 2024

A Target-Aware Analysis of Data Augmentation for Hate Speech Detection

Camilla Casula, Sara Tonelli

PDF

Open Access

TL;DR

This paper explores data augmentation techniques, including generative models, to improve hate speech detection, especially for underrepresented groups, demonstrating that combined methods enhance classification performance and fairness.

Contribution

It introduces a target-aware data augmentation approach using generative language models to address target imbalance in hate speech datasets.

Findings

01

Traditional data augmentation often outperforms generative models alone.

02

Combining augmentation methods yields the best classification results.

03

Improved F1 scores (>10%) for categories like origin, religion, and disability.

Abstract

Hate speech is one of the main threats posed by the widespread use of social networks, despite efforts to limit it. Although attention has been devoted to this issue, the lack of datasets and case studies centered around scarcely represented phenomena, such as ableism or ageism, can lead to hate speech detection systems that do not perform well on underrepresented identity groups. Given the unpreceded capabilities of LLMs in producing high-quality data, we investigate the possibility of augmenting existing data with generative language models, reducing target imbalance. We experiment with augmenting 1,000 posts from the Measuring Hate Speech corpus, an English dataset annotated with target identity information, adding around 30,000 synthetic examples using both simple data augmentation methods and different types of generative models, comparing autoregressive and sequence-to-sequence…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection

MethodsSoftmax · Attention Is All You Need