Decoding Hate: Exploring Language Models' Reactions to Hate Speech
Paloma Piot, Javier Parapar

TL;DR
This study examines how seven advanced language models respond to hate speech, analyzing their reactions and discussing mitigation strategies to prevent hate speech generation in AI systems.
Contribution
It provides a comparative analysis of LLMs' responses to hate speech and explores effective mitigation techniques like fine-tuning and guideline guardrailing.
Findings
Models show varied responses to hate speech inputs.
Fine-tuning and guardrailing can reduce hate speech generation.
Models respond differently to politically correct framing.
Abstract
Hate speech is a harmful form of online expression, often manifesting as derogatory posts. It is a significant risk in digital environments. With the rise of Large Language Models (LLMs), there is concern about their potential to replicate hate speech patterns, given their training on vast amounts of unmoderated internet data. Understanding how LLMs respond to hate speech is crucial for their responsible deployment. However, the behaviour of LLMs towards hate speech has been limited compared. This paper investigates the reactions of seven state-of-the-art LLMs (LLaMA 2, Vicuna, LLaMA 3, Mistral, GPT-3.5, GPT-4, and Gemini Pro) to hate speech. Through qualitative analysis, we aim to reveal the spectrum of responses these models produce, highlighting their capacity to handle hate speech inputs. We also discuss strategies to mitigate hate speech generation by LLMs, particularly through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHate Speech and Cyberbullying Detection
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Cosine Annealing · Layer Normalization · Attention Is All You Need · Linear Warmup With Cosine Annealing · Adam · Linear Layer · Residual Connection · Weight Decay
