How I Stopped Worrying about the Twitter Archive at the Library of Congress and Learned to Build a Little One for Myself
Daniel Gayo-Avello

TL;DR
This paper describes how to build a small, accessible Twitter archive covering 2006-2009 with 1.48 billion tweets, enabling researchers to bypass expensive proprietary archives using open-source tools and methods.
Contribution
It provides a practical guide for constructing a manageable Twitter archive from free data sources, demonstrating feasibility without extensive resources.
Findings
Successfully built a 1.48 billion tweet archive from 2006-2009
Provided detailed instructions for replication using open-source tools
Showed that smaller, searchable archives are feasible for academic research
Abstract
Twitter is among the commonest sources of data employed in social media research mainly because of its convenient APIs to collect tweets. However, most researchers do not have access to the expensive Firehose and Twitter Historical Archive, and they must rely on data collected with free APIs whose representativeness has been questioned. In 2010 the Library of Congress announced an agreement with Twitter to provide researchers access to the whole Twitter Archive. However, such a task proved to be daunting and, at the moment of this writing, no researcher has had the opportunity to access such materials. Still, there have been experiences that proved that smaller searchable archives are feasible and, therefore, amenable for academics to build with relatively little resources. In this paper I describe my efforts to build one of such archives, covering the first three years of Twitter…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Complex Network Analysis Techniques · Advanced Text Analysis Techniques
