The Big Ban Theory: A Pre- and Post-Intervention Dataset of Online Content Moderation Actions

Aldo Cerulli; Lorenzo Cima; Benedetta Tessa; Serena Tardelli; Stefano Cresci

arXiv:2601.11128·cs.SI·January 28, 2026

The Big Ban Theory: A Pre- and Post-Intervention Dataset of Online Content Moderation Actions

Aldo Cerulli, Lorenzo Cima, Benedetta Tessa, Serena Tardelli, Stefano Cresci

PDF

Open Access

TL;DR

The paper introduces The Big Ban Theory dataset, a comprehensive collection of online content moderation interventions, enabling systematic analysis of their effects across diverse cases to improve research reproducibility.

Contribution

It provides the first large-scale, standardized dataset of 25 moderation interventions with pre- and post-intervention user activity data for systematic research.

Findings

01

Dataset includes over 339K users and 39M messages.

02

Standardized metadata allows for consistent comparisons.

03

Supports research on moderation effects and biases.

Abstract

Online platforms rely on moderation interventions to curb harmful behavior such hate speech, toxicity, and the spread of mis- and disinformation. Yet research on the effects and possible biases of such interventions faces multiple limitations. For example, existing works frequently focus on single or a few interventions, due to the absence of comprehensive datasets. As a result, researchers must typically collect the necessary data for each new study, which limits opportunities for systematic comparisons. To overcome these challenges, we introduce The Big Ban Theory (TBBT), a large dataset of moderation interventions. TBBT covers 25 interventions of varying type, severity, and scope, comprising in total over 339K users and nearly 39M posted messages. For each intervention, we provide standardized metadata and pseudonymized user activity collected three months before and after its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Misinformation and Its Impacts · Spam and Phishing Detection