TL;DR
This study investigates the impact of counterspeech on hate group newcomers on Reddit, revealing that less toxic counterspeech can reduce newcomer participation, but toxic counterspeech may escalate hostility.
Contribution
It introduces a novel LLM-based counterspeech detection method and analyzes its effects on hate community engagement and hostility levels.
Findings
Counterspeech is less toxic than hate speech but more toxic than other discourse in hate subreddits.
Receiving counterspeech reduces the likelihood of hate newcomers continuing to post.
Toxic counterspeech increases ongoing hostility from hate users.
Abstract
Counterspeech has gained attention as a strategy to reduce hate speech on social media. Although previous studies suggest that counterspeech can reduce hate speech, little is known about its effects on participation in online hate communities. Relatedly, we lack an understanding about the degree of hostility in counterspeech. Hostile counterspeech may increase online conflict, potentially hardening the positions of hate adherents, and further eroding online environments. Here, we analyzed the effect of counterspeech on 16,513 newcomers across 104 hate subreddits (forums within Reddit.com). We devised an LLM-based counterspeech detection approach that outperforms specialized models trained on existing datasets, then examined the presence, and effects of, hostility. While counterspeech comments are less toxic than hate speech comments, they are almost twice as toxic as other discourse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
