To Act or React: Investigating Proactive Strategies For Online Community Moderation
Hussam Habib, Maaz Bin Musa, Fareed Zaffar, Rishab Nithyanand

TL;DR
This paper explores proactive moderation strategies on Reddit, using explainable machine learning to predict communities at risk of becoming hateful or dangerous, aiming to improve early intervention and reduce harmful discourse.
Contribution
It introduces a predictive framework for identifying at-risk communities before they become problematic, enhancing moderation effectiveness with explainable AI insights.
Findings
Communities evolve in ways that can be predicted months in advance.
Explainable machine learning identifies key predictors of dangerous community development.
Proactive moderation can potentially reduce the spread of hateful content.
Abstract
Reddit administrators have generally struggled to prevent or contain such discourse for several reasons including: (1) the inability for a handful of human administrators to track and react to millions of posts and comments per day and (2) fear of backlash as a consequence of administrative decisions to ban or quarantine hateful communities. Consequently, as shown in our background research, administrative actions (community bans and quarantines) are often taken in reaction to media pressure following offensive discourse within a community spilling into the real world with serious consequences. In this paper, we investigate the feasibility of proactive moderation on Reddit -- i.e., proactively identifying communities at risk of committing offenses that previously resulted in bans for other communities. Proactive moderation strategies show promise for two reasons: (1) they have potential…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Social Media and Politics · Spam and Phishing Detection
