ReportAGE: Automatically extracting the exact age of Twitter users based on self-reports in tweets
Ari Z. Klein, Arjun Magge, Graciela Gonzalez-Hernandez

TL;DR
This paper introduces ReportAGE, an NLP pipeline that automatically detects and extracts the exact age of Twitter users from self-reports in tweets, enabling more precise demographic analysis for social media research.
Contribution
The study develops and evaluates a novel deep learning-based method for extracting exact user ages from tweets, achieving high accuracy and scalability on large datasets.
Findings
Achieved F1-score of 0.931 for age detection on test data.
Successfully predicted ages for over 132,000 users from 1.2 billion tweets.
Inter-annotator agreement was high, with Fleiss' kappa of 0.80 and 0.95.
Abstract
Advancing the utility of social media data for research applications requires methods for automatically detecting demographic information about social media study populations, including users' age. The objective of this study was to develop and evaluate a method that automatically identifies the exact age of users based on self-reports in their tweets. Our end-to-end automatic natural language processing (NLP) pipeline, ReportAGE, includes query patterns to retrieve tweets that potentially mention an age, a classifier to distinguish retrieved tweets that self-report the user's exact age ("age" tweets) and those that do not ("no age" tweets), and rule-based extraction to identify the age. To develop and evaluate ReportAGE, we manually annotated 11,000 tweets that matched the query patterns. Based on 1000 tweets that were annotated by all five annotators, inter-annotator agreement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
