A Web Scraping Methodology for Bypassing Twitter API Restrictions

A. Hernandez-Suarez; G. Sanchez-Perez; K. Toscano-Medina; V.; Martinez-Hernandez; V. Sanchez; H. Perez-Meana

arXiv:1803.09875·cs.IR·March 28, 2018·45 cites

A Web Scraping Methodology for Bypassing Twitter API Restrictions

A. Hernandez-Suarez, G. Sanchez-Perez, K. Toscano-Medina, V., Martinez-Hernandez, V. Sanchez, H. Perez-Meana

PDF

Open Access

TL;DR

This paper introduces a web scraping methodology that enables collecting historical Twitter data across any date range, overcoming the platform's API restrictions that limit data retrieval to recent tweets.

Contribution

The paper presents a novel web scraping approach for historical Twitter data collection, bypassing API limitations and expanding data access for research.

Findings

01

Effective collection of historical tweets across arbitrary date ranges

02

Bypasses Twitter API restrictions for data retrieval

03

Facilitates more comprehensive social media data analysis

Abstract

Retrieving information from social networks is the first and primordial step many data analysis fields such as Natural Language Processing, Sentiment Analysis and Machine Learning. Important data science tasks relay on historical data gathering for further predictive results. Most of the recent works use Twitter API, a public platform for collecting public streams of information, which allows querying chronological tweets for no more than three weeks old. In this paper, we present a new methodology for collecting historical tweets within any date range using web scraping techniques bypassing for Twitter API restrictions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWeb Data Mining and Analysis · Complex Network Analysis Techniques · Advanced Text Analysis Techniques