Taming Toxic Talk: Using chatbots to intervene with users posting toxic comments
Jeremy Foote, Deepak Kumar, Bedadyuti Jha, Ryan Funkhouser, Loizos Bitsikokos, Hitesh Goel, and Hsuen-Chi Chiu

TL;DR
This study investigates whether AI chatbots can rehabilitate users who post toxic comments online, finding that while users engage positively, there is no significant reduction in toxic behavior over a month.
Contribution
First large-scale field experiment assessing AI chatbot intervention effectiveness for online toxicity, providing insights into rehabilitative approaches.
Findings
Participants engaged in good faith conversations.
Many expressed remorse or desire to change.
No significant reduction in toxic behavior observed.
Abstract
Generative AI chatbots have proven surprisingly effective at persuading people to change their beliefs and attitudes in lab settings. However, the practical implications of these findings are not yet clear. In this work, we explore the impact of rehabilitative conversations with generative AI chatbots on users who share toxic content online. Toxic behaviors -- like insults or threats of violence, are widespread in online communities. Strategies to deal with toxic behavior are typically punitive, such as removing content or banning users. Rehabilitative approaches are rarely attempted, in part due to the emotional and psychological cost of engaging with aggressive users. In collaboration with seven large Reddit communities, we conducted a large-scale field experiment (N=893) to invite people who had recently posted toxic content to participate in conversations with AI chatbots. A…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in Service Interactions · Hate Speech and Cyberbullying Detection · Misinformation and Its Impacts
