Silencing Empowerment, Allowing Bigotry: Auditing the Moderation of Hate Speech on Twitch
Prarabdh Shukla, Wei Yin Chong, Yash Patel, Brennan Schaffner, Danish Pruthi, Arjun Bhagoji

TL;DR
This study audits Twitch's AutoMod system, revealing it often fails to detect hate speech and blocks benign content, highlighting the need for context-aware moderation improvements.
Contribution
The paper provides a comprehensive empirical evaluation of Twitch's AutoMod, exposing its limitations in detecting hate speech and understanding context.
Findings
AutoMod misses up to 94% of hateful comments
AutoMod relies heavily on slurs for moderation
Benign content with sensitive words is often blocked
Abstract
To meet the demands of content moderation, online platforms have resorted to automated systems. Newer forms of real-time engagement(, users commenting on live streams) on platforms like Twitch exert additional pressures on the latency expected of such moderation systems. Despite their prevalence, relatively little is known about the effectiveness of these systems. In this paper, we conduct an audit of Twitch's automated moderation tool () to investigate its effectiveness in flagging hateful content. For our audit, we create streaming accounts to act as siloed test beds, and interface with the live chat using Twitch's APIs to send over comments collated from datasets. We measure 's accuracy in flagging blatantly hateful content containing misogyny, racism, ableism and homophobia. Our experiments reveal that a large fraction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Spam and Phishing Detection · Misinformation and Its Impacts
