HateRephrase: Zero- and Few-Shot Reduction of Hate Intensity in Online   Posts using Large Language Models

Vibhor Agarwal; Yu Chen; Nishanth Sastry

arXiv:2310.13985·cs.CL·October 24, 2023·1 cites

HateRephrase: Zero- and Few-Shot Reduction of Hate Intensity in Online Posts using Large Language Models

Vibhor Agarwal, Yu Chen, Nishanth Sastry

PDF

Open Access

TL;DR

This paper explores using large language models to rephrase potential hate speech before posting, effectively reducing hate intensity while preserving meaning, and demonstrates GPT-3.5's superior performance over baselines through comprehensive experiments.

Contribution

It introduces a novel approach of preemptively rephrasing hate speech using LLMs, outperforming existing baselines and even human rephrasings in reducing hate intensity.

Findings

01

GPT-3.5 outperforms baselines and open-source models.

02

Few-shot prompting yields the best rephrasings.

03

Human evaluations favor GPT-3.5 rephrasings over ground truth.

Abstract

Hate speech has become pervasive in today's digital age. Although there has been considerable research to detect hate speech or generate counter speech to combat hateful views, these approaches still cannot completely eliminate the potential harmful societal consequences of hate speech -- hate speech, even when detected, can often not be taken down or is often not taken down enough; and hate speech unfortunately spreads quickly, often much faster than any generated counter speech. This paper investigates a relatively new yet simple and effective approach of suggesting a rephrasing of potential hate speech content even before the post is made. We show that Large Language Models (LLMs) perform well on this task, outperforming state-of-the-art baselines such as BART-Detox. We develop 4 different prompts based on task description, hate definition, few-shot demonstrations and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Attention Dropout · Softmax · Dense Connections · Cosine Annealing · Adam · Residual Connection