Generative AI may backfire for counterspeech
Dominik B\"ar, Abdurahman Maarouf, Stefan Feuerriegel

TL;DR
This study evaluates the effectiveness of generative AI in creating counterspeech to combat online hate speech, finding that while generic warnings work, AI-generated contextualized messages may be ineffective or counterproductive.
Contribution
The paper provides the first large-scale field experiment assessing LLM-generated counterspeech effectiveness in real social media settings.
Findings
Non-contextualized counterspeech with warnings reduces hate speech
Contextualized AI-generated counterspeech may backfire or be ineffective
Large-scale Twitter experiment with 2,664 participants
Abstract
Online hate speech poses a serious threat to individual well-being and societal cohesion. A promising solution to curb online hate speech is counterspeech. Counterspeech is aimed at encouraging users to reconsider hateful posts by direct replies. However, current methods lack scalability due to the need for human intervention or fail to adapt to the specific context of the post. A potential remedy is the use of generative AI, specifically large language models (LLMs), to write tailored counterspeech messages. In this paper, we analyze whether contextualized counterspeech generated by state-of-the-art LLMs is effective in curbing online hate speech. To do so, we conducted a large-scale, pre-registered field experiment (N=2,664) on the social media platform Twitter/X. Our experiment followed a 2x2 between-subjects design and, additionally, a control condition with no counterspeech. On the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScience, Research, and Medicine
