A Dataset of State-Censored Tweets
Tu\u{g}rulcan Elmas, Rebekah Overdorf, Karl Aberer

TL;DR
This paper introduces a large, publicly available dataset of censored tweets and accounts from 2012-2020, enabling research on government censorship, hate speech, and social media dynamics.
Contribution
The authors provide the first extensive dataset of state-censored tweets and accounts, along with an exploratory analysis to facilitate censorship and social media research.
Findings
Dataset includes 583,437 censored tweets and 4,301 fully censored accounts.
Supplemental data comprises over 22 million tweets from users with censored content.
The dataset supports research on censorship effects, hate speech, and social media behavior.
Abstract
Many governments impose traditional censorship methods on social media platforms. Instead of removing it completely, many social media companies, including Twitter, only withhold the content from the requesting country. This makes such content still accessible outside of the censored region, allowing for an excellent setting in which to study government censorship on social media. We mine such content using the Internet Archive's Twitter Stream Grab. We release a dataset of 583,437 tweets by 155,715 users that were censored between 2012-2020 July. We also release 4,301 accounts that were censored in their entirety. Additionally, we release a set of 22,083,759 supplemental tweets made up of all tweets by users with at least one censored tweet as well as instances of other users retweeting the censored user. We provide an exploratory analysis of this dataset. Our dataset will not only aid…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Internet Traffic Analysis and Secure E-voting · Social Media and Politics
