Weigh Your Own Words: Improving Hate Speech Counter Narrative Generation   via Attention Regularization

Helena Bonaldi; Giuseppe Attanasio; Debora Nozza; Marco Guerini

arXiv:2309.02311·cs.CL·September 6, 2023·2 cites

Weigh Your Own Words: Improving Hate Speech Counter Narrative Generation via Attention Regularization

Helena Bonaldi, Giuseppe Attanasio, Debora Nozza, Marco Guerini

PDF

Open Access 1 Repo

TL;DR

This paper proposes attention regularization methods to enhance the generalization of transformer models for hate speech counter narrative generation, resulting in more diverse and effective responses especially for unseen targets.

Contribution

It introduces novel attention regularization techniques that reduce overfitting in PLMs, improving diversity and generalization in hate speech counter narrative generation.

Findings

01

Regularized models outperform state-of-the-art in most cases

02

Improved performance on unseen hate targets

03

Enhanced narrative diversity and richness

Abstract

Recent computational approaches for combating online hate speech involve the automatic generation of counter narratives by adapting Pretrained Transformer-based Language Models (PLMs) with human-curated data. This process, however, can produce in-domain overfitting, resulting in models generating acceptable narratives only for hatred similar to training data, with little portability to other targets or to real-world toxic language. This paper introduces novel attention regularization methodologies to improve the generalization capabilities of PLMs for counter narratives generation. Overfitting to training-specific terms is then discouraged, resulting in more diverse and richer narratives. We experiment with two attention-based regularization techniques on a benchmark English dataset. Regularized models produce better counter narratives than state-of-the-art approaches in most cases,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

milanlproc/weigh-your-own-words
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection