Multi-Platform Aggregated Dataset of Online Communities (MADOC)
Marija Mitrovi\'c Dankulov, Aleksandar Toma\v{s}evi\'c, Slobodan Maleti\'c, Miroslav An{\dj}elkovi\'c, Ana Vrani\'c, Darja Cvetkovi\'c, Boris Stupovski, Du\v{s}an Vudragovi\'c, Sara Major, Aleksandar Bogojevi\'c

TL;DR
MADOC is a large, standardized, cross-platform dataset of online communities enabling research on social dynamics, toxicity, moderation, and migration across Bluesky, Koo, Reddit, and Voat.
Contribution
It introduces a comprehensive, FAIR-compliant dataset aggregating data from multiple platforms with standardized formats for advanced social science research.
Findings
Enables comparative analysis of toxic behavior evolution.
Supports research on content moderation impacts.
Facilitates platform migration studies.
Abstract
The Multi-platform Aggregated Dataset of Online Communities (MADOC) is a comprehensive dataset that facilitates computational social science research by providing FAIR-compliant standardized access to cross-platform analysis of online social dynamics. MADOC aggregates and standardizes data from Bluesky, Koo, Reddit, and Voat (2012-2024), containing 18.9 million posts, 236 million comments, and 23.1 million unique users. The dataset enables comparative studies of toxic behavior evolution across platforms through standardized interaction records and sentiment analysis. By providing UUID-anonymized user histories and temporal alignment of banned communities' activity patterns, MADOC supports research on content moderation impacts and platform migration trends. Distributed via Zenodo with persistent identifiers and Python/R toolkits, the dataset adheres to FAIR principles while addressing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCaching and Content Delivery
