A Python Library for Exploratory Data Analysis on Twitter Data based on Tokens and Aggregated Origin-Destination Information
Mario Graff, Daniela Moctezuma, Sabino Miranda-Jim\'enez and, Eric S. Tellez

TL;DR
This paper introduces a Python library that simplifies exploratory data analysis of Twitter data, focusing on tokens and origin-destination info across multiple languages and domains, aiding event detection and mobility studies.
Contribution
The novel library enables comprehensive analysis of Twitter data with multilingual support and aggregated mobility information, facilitating diverse research applications.
Findings
Analyzes topics and language dialects in Twitter data.
Provides mobility reports for over 200 countries.
Extracts frequency and bi-gram statistics for multiple languages.
Abstract
Twitter is perhaps the social media more amenable for research. It requires only a few steps to obtain information, and there are plenty of libraries that can help in this regard. Nonetheless, knowing whether a particular event is expressed on Twitter is a challenging task that requires a considerable collection of tweets. This proposal aims to facilitate, to a researcher interested, the process of mining events on Twitter by opening a collection of processed information taken from Twitter since December 2015. The events could be related to natural disasters, health issues, and people's mobility, among other studies that can be pursued with the library proposed. Different applications are presented in this contribution to illustrate the library's capabilities: an exploratory analysis of the topics discovered in tweets, a study on similarity among dialects of the Spanish language, and a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadio, Podcasts, and Digital Media · Human Mobility and Location-Based Analysis · Communication and COVID-19 Impact
