Weigh Your Own Words: Improving Hate Speech Counter Narrative Generation via Attention Regularization
Helena Bonaldi, Giuseppe Attanasio, Debora Nozza, Marco Guerini

TL;DR
This paper proposes attention regularization methods to enhance the generalization of transformer models for hate speech counter narrative generation, resulting in more diverse and effective responses especially for unseen targets.
Contribution
It introduces novel attention regularization techniques that reduce overfitting in PLMs, improving diversity and generalization in hate speech counter narrative generation.
Findings
Regularized models outperform state-of-the-art in most cases
Improved performance on unseen hate targets
Enhanced narrative diversity and richness
Abstract
Recent computational approaches for combating online hate speech involve the automatic generation of counter narratives by adapting Pretrained Transformer-based Language Models (PLMs) with human-curated data. This process, however, can produce in-domain overfitting, resulting in models generating acceptable narratives only for hatred similar to training data, with little portability to other targets or to real-world toxic language. This paper introduces novel attention regularization methodologies to improve the generalization capabilities of PLMs for counter narratives generation. Overfitting to training-specific terms is then discouraged, resulting in more diverse and richer narratives. We experiment with two attention-based regularization techniques on a benchmark English dataset. Regularized models produce better counter narratives than state-of-the-art approaches in most cases,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
