A Web Scraping Methodology for Bypassing Twitter API Restrictions
A. Hernandez-Suarez, G. Sanchez-Perez, K. Toscano-Medina, V., Martinez-Hernandez, V. Sanchez, H. Perez-Meana

TL;DR
This paper introduces a web scraping methodology that enables collecting historical Twitter data across any date range, overcoming the platform's API restrictions that limit data retrieval to recent tweets.
Contribution
The paper presents a novel web scraping approach for historical Twitter data collection, bypassing API limitations and expanding data access for research.
Findings
Effective collection of historical tweets across arbitrary date ranges
Bypasses Twitter API restrictions for data retrieval
Facilitates more comprehensive social media data analysis
Abstract
Retrieving information from social networks is the first and primordial step many data analysis fields such as Natural Language Processing, Sentiment Analysis and Machine Learning. Important data science tasks relay on historical data gathering for further predictive results. Most of the recent works use Twitter API, a public platform for collecting public streams of information, which allows querying chronological tweets for no more than three weeks old. In this paper, we present a new methodology for collecting historical tweets within any date range using web scraping techniques bypassing for Twitter API restrictions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Complex Network Analysis Techniques · Advanced Text Analysis Techniques
