The Enforcement and Feasibility of Hate Speech Moderation on Twitter
Manuel Tonneau, Dylan Thurgood, Diyi Liu, Niyati Malhotra, Victor Orozco-Olvera, Ralph Schroeder, Scott A. Hale, Manoel Horta Ribeiro, Paul R\"ottger, Samuel P. Fraiberger

TL;DR
This study audits Twitter's hate speech moderation, revealing persistent hateful content online and analyzing the technical and institutional factors affecting enforcement effectiveness.
Contribution
It provides a comprehensive global audit of hate speech enforcement on Twitter, highlighting technical limitations and institutional resource allocation issues.
Findings
80% of hateful tweets remain online after five months
Automated detection systems struggle with false positives but aid human review
Reducing user exposure to hate speech is economically feasible with current moderation strategies
Abstract
Online hate speech is associated with substantial social harms, yet it remains unclear how consistently platforms enforce hate speech policies or whether enforcement is feasible at scale. We address these questions through a global audit of hate speech moderation on Twitter (now X). Using a complete 24-hour snapshot of public tweets, we construct representative samples comprising 540,000 tweets annotated for hate speech by trained annotators across eight major languages. Five months after posting, 80% of hateful tweets remain online, including explicitly violent hate speech. Such tweets are no more likely to be removed than non-hateful tweets, with neither severity nor visibility increasing the likelihood of removal. We then examine whether these enforcement gaps reflect technical limits of large-scale moderation systems. While fully automated detection systems cannot reliably identify…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
