The Big Ban Theory: A Pre- and Post-Intervention Dataset of Online Content Moderation Actions
Aldo Cerulli, Lorenzo Cima, Benedetta Tessa, Serena Tardelli, Stefano Cresci

TL;DR
The paper introduces The Big Ban Theory dataset, a comprehensive collection of online content moderation interventions, enabling systematic analysis of their effects across diverse cases to improve research reproducibility.
Contribution
It provides the first large-scale, standardized dataset of 25 moderation interventions with pre- and post-intervention user activity data for systematic research.
Findings
Dataset includes over 339K users and 39M messages.
Standardized metadata allows for consistent comparisons.
Supports research on moderation effects and biases.
Abstract
Online platforms rely on moderation interventions to curb harmful behavior such hate speech, toxicity, and the spread of mis- and disinformation. Yet research on the effects and possible biases of such interventions faces multiple limitations. For example, existing works frequently focus on single or a few interventions, due to the absence of comprehensive datasets. As a result, researchers must typically collect the necessary data for each new study, which limits opportunities for systematic comparisons. To overcome these challenges, we introduce The Big Ban Theory (TBBT), a large dataset of moderation interventions. TBBT covers 25 interventions of varying type, severity, and scope, comprising in total over 339K users and nearly 39M posted messages. For each intervention, we provide standardized metadata and pseudonymized user activity collected three months before and after its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Misinformation and Its Impacts · Spam and Phishing Detection
