Twitter-Demographer: A Flow-based Tool to Enrich Twitter Data
Federico Bianchi, Vincenzo Cutrona, Dirk Hovy

TL;DR
Twitter-Demographer is a flow-based tool designed to enrich Twitter datasets with additional user and tweet information, enhancing research capabilities while addressing privacy and reproducibility concerns.
Contribution
It introduces a modular, extendable flow-based framework for augmenting Twitter data with auxiliary variables, emphasizing privacy-by-design and reproducibility.
Findings
Enables easy chaining and extension of data enrichment components
Provides privacy measures to support pseudo-anonymity
Facilitates reproducible social science and NLP research
Abstract
Twitter data have become essential to Natural Language Processing (NLP) and social science research, driving various scientific discoveries in recent years. However, the textual data alone are often not enough to conduct studies: especially social scientists need more variables to perform their analysis and control for various factors. How we augment this information, such as users' location, age, or tweet sentiment, has ramifications for anonymity and reproducibility, and requires dedicated effort. This paper describes Twitter-Demographer, a simple, flow-based tool to enrich Twitter data with additional information about tweets and users. Twitter-Demographer is aimed at NLP practitioners and (computational) social scientists who want to enrich their datasets with aggregated information, facilitating reproducibility, and providing algorithmic privacy-by-design measures for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy, Security, and Data Protection · Hate Speech and Cyberbullying Detection · Social Media and Politics
