Outcome-Constrained Large Language Models for Countering Hate Speech

Lingzi Hong; Pengcheng Luo; Eduardo Blanco; Xiaoying Song

arXiv:2403.17146·cs.CL·October 2, 2024·2 cites

Outcome-Constrained Large Language Models for Countering Hate Speech

Lingzi Hong, Pengcheng Luo, Eduardo Blanco, Xiaoying Song

PDF

Open Access 1 Video

TL;DR

This paper develops methods for generating counterspeech using large language models constrained by conversation outcomes, aiming to reduce incivility and prevent hateful reentry, and evaluates their effectiveness.

Contribution

It introduces outcome-constrained counterspeech generation methods using LLMs, including instruction prompts, finetuning, and reinforcement learning, to improve online hate speech mitigation.

Findings

01

Methods effectively steer counterspeech toward desired outcomes

02

Different models produce varying quality and style of counterspeech

03

Outcome constraints influence counterspeech effectiveness

Abstract

Automatic counterspeech generation methods have been developed to assist efforts in combating hate speech. Existing research focuses on generating counterspeech with linguistic attributes such as being polite, informative, and intent-driven. However, the real impact of counterspeech in online environments is seldom considered. This study aims to develop methods for generating counterspeech constrained by conversation outcomes and evaluate their effectiveness. We experiment with large language models (LLMs) to incorporate into the text generation process two desired conversation outcomes: low conversation incivility and non-hateful hater reentry. Specifically, we experiment with instruction prompts, LLM finetuning, and LLM reinforcement learning (RL). Evaluation results show that our methods effectively steer the generation of counterspeech toward the desired outcomes. Our analyses,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Outcome-Constrained Large Language Models for Countering Hate Speech· underline

Taxonomy

TopicsHate Speech and Cyberbullying Detection