CounterGeDi: A controllable approach to generate polite, detoxified and   emotional counterspeech

Punyajoy Saha; Kanishk Singh; Adarsh Kumar; Binny Mathew; Animesh; Mukherjee

arXiv:2205.04304·cs.CL·May 10, 2022·1 cites

CounterGeDi: A controllable approach to generate polite, detoxified and emotional counterspeech

Punyajoy Saha, Kanishk Singh, Adarsh Kumar, Binny Mathew, Animesh, Mukherjee

PDF

Open Access 1 Repo

TL;DR

CounterGeDi is a controllable generation model that guides counterspeech production to be more polite, detoxified, and emotionally expressive, improving quality without losing relevance across multiple datasets.

Contribution

It introduces CounterGeDi, an ensemble of generative discriminators, to steer counterspeech generation towards specific attributes, enhancing effectiveness over vanilla models.

Findings

01

Politeness scores increased by ~15%.

02

Detoxification scores increased by ~6%.

03

Emotion in counterspeech increased by at least 10%.

Abstract

Recently, many studies have tried to create generation models to assist counter speakers by providing counterspeech suggestions for combating the explosive proliferation of online hate. However, since these suggestions are from a vanilla generation model, they might not include the appropriate properties required to counter a particular hate speech instance. In this paper, we propose CounterGeDi - an ensemble of generative discriminators (GeDi) to guide the generation of a DialoGPT model toward more polite, detoxified, and emotionally laden counterspeech. We generate counterspeech using three datasets and observe significant improvement across different attribute scores. The politeness and detoxification scores increased by around 15% and 6% respectively, while the emotion in the counterspeech increased by at least 10% across all the datasets. We also experiment with triple-attribute…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hate-alert/countergedi
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection