Discord Unveiled: A Comprehensive Dataset of Public Communication (2015-2024)
Yan Aquino, Pedro Bento, Arthur Buzelin, Lucas Dayrell, Samira, Malaquias, Caio Santana, Victoria Estanislau, Pedro Dutenhefner, Guilherme H., G. Evangelista, Luisa G. Porf\'irio, Caio Souza Grossi, Pedro B. Rigueira,, Virgilio Almeida, Gisele L. Pappa, Wagner Meira Jr

TL;DR
This paper introduces Discord Unveiled, the largest publicly available dataset of Discord communications from 2015 to 2024, enabling extensive analysis of online community dynamics, moderation, and social trends.
Contribution
It provides the most comprehensive, anonymized dataset of Discord public server messages, covering over 2 billion messages from 4.74 million users across 3,167 servers, facilitating advanced social science research.
Findings
Significant growth in user engagement and bot use over time.
Linguistic diversity with English, Spanish, French, and Portuguese.
Emergence of community themes beyond gaming, like art and memes.
Abstract
Discord has evolved from a gaming-focused communication tool into a versatile platform supporting diverse online communities. Despite its large user base and active public servers, academic research on Discord remains limited due to data accessibility challenges. This paper introduces Discord Unveiled: A Comprehensive Dataset of Public Communication (2015-2024), the most extensive Discord public server's data to date. The dataset comprises over 2.05 billion messages from 4.74 million users across 3,167 public servers, representing approximately 10% of servers listed in Discord's Discovery feature. Spanning from Discord's launch in 2015 to the end of 2024, it offers a robust temporal and thematic framework for analyzing decentralized moderation, community governance, information dissemination, and social dynamics. Data was collected through Discord's public API, adhering to ethical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPublic Relations and Crisis Communication · Computational and Text Analysis Methods
