Silencing Empowerment, Allowing Bigotry: Auditing the Moderation of Hate Speech on Twitch

Prarabdh Shukla; Wei Yin Chong; Yash Patel; Brennan Schaffner; Danish Pruthi; Arjun Bhagoji

arXiv:2506.07667·cs.CL·June 11, 2025

Silencing Empowerment, Allowing Bigotry: Auditing the Moderation of Hate Speech on Twitch

Prarabdh Shukla, Wei Yin Chong, Yash Patel, Brennan Schaffner, Danish Pruthi, Arjun Bhagoji

PDF

Open Access 1 Repo 1 Video

TL;DR

This study audits Twitch's AutoMod system, revealing it often fails to detect hate speech and blocks benign content, highlighting the need for context-aware moderation improvements.

Contribution

The paper provides a comprehensive empirical evaluation of Twitch's AutoMod, exposing its limitations in detecting hate speech and understanding context.

Findings

01

AutoMod misses up to 94% of hateful comments

02

AutoMod relies heavily on slurs for moderation

03

Benign content with sensitive words is often blocked

Abstract

To meet the demands of content moderation, online platforms have resorted to automated systems. Newer forms of real-time engagement( $e.g.$ , users commenting on live streams) on platforms like Twitch exert additional pressures on the latency expected of such moderation systems. Despite their prevalence, relatively little is known about the effectiveness of these systems. In this paper, we conduct an audit of Twitch's automated moderation tool ( $AutoMod$ ) to investigate its effectiveness in flagging hateful content. For our audit, we create streaming accounts to act as siloed test beds, and interface with the live chat using Twitch's APIs to send over $107, 000$ comments collated from $4$ datasets. We measure $AutoMod$ 's accuracy in flagging blatantly hateful content containing misogyny, racism, ableism and homophobia. Our experiments reveal that a large fraction…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

weiyinc11/HateSpeechModerationTwitch
noneOfficial

Videos

Silencing Empowerment, Allowing Bigotry: Auditing the Moderation of Hate Speech on Twitch· underline

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Spam and Phishing Detection · Misinformation and Its Impacts