Storywrangler: A massive exploratorium for sociolinguistic, cultural, socioeconomic, and political timelines using Twitter
Thayer Alshaabi, Jane L. Adams, Michael V. Arnold, Joshua R. Minot,, David R. Dewhurst, Andrew J. Reagan, Christopher M. Danforth, and Peter, Sheridan Dodds

TL;DR
Storywrangler is a comprehensive tool that analyzes over a decade of Twitter data to track linguistic, cultural, and social trends in real-time, enabling diverse sociolinguistic and sociopolitical research.
Contribution
It introduces a large-scale, real-time Twitter data curation system for tracking n-gram usage across multiple languages, with interactive visualization and extensibility to other social media platforms.
Findings
Over 100 billion tweets analyzed from 2008 to 2021
Provides interactive and downloadable time series data
Enables case studies linking social media trends to real-world events
Abstract
In real-time, social media data strongly imprints world events, popular culture, and day-to-day conversations by millions of ordinary people at a scale that is scarcely conventionalized and recorded. Vitally, and absent from many standard corpora such as books and news archives, sharing and commenting mechanisms are native to social media platforms, enabling us to quantify social amplification (i.e., popularity) of trending storylines and contemporary cultural phenomena. Here, we describe Storywrangler, a natural language processing instrument designed to carry out an ongoing, day-scale curation of over 100 billion tweets containing roughly 1 trillion 1-grams from 2008 to 2021. For each day, we break tweets into unigrams, bigrams, and trigrams spanning over 100 languages. We track n-gram usage frequencies, and generate Zipf distributions, for words, hashtags, handles, numerals, symbols,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
