Probabilistic Inference of Twitter Users' Age based on What They Follow
Benjamin Paul Chamberlain, Clive Humby, Marc Peter Deisenroth

TL;DR
This paper presents a scalable, language-independent Bayesian method to infer Twitter users' ages based on their follow networks, enabling age estimation for hundreds of millions of accounts.
Contribution
It introduces a novel Bayesian approach that generalizes age information across the Twitter network without relying on language-dependent features.
Findings
Scales to infer ages of 700 million accounts
Achieves high accuracy in age prediction
Operates independently of language or tweet content
Abstract
Twitter provides an open and rich source of data for studying human behaviour at scale and is widely used in social and network sciences. However, a major criticism of Twitter data is that demographic information is largely absent. Enhancing Twitter data with user ages would advance our ability to study social network structures, information flows and the spread of contagions. Approaches toward age detection of Twitter users typically focus on specific properties of tweets, e.g., linguistic features, which are language dependent. In this paper, we devise a language-independent methodology for determining the age of Twitter users from data that is native to the Twitter ecosystem. The key idea is to use a Bayesian framework to generalise ground-truth age information from a few Twitter users to the entire network based on what/whom they follow. Our approach scales to inferring the age of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Topic Modeling · Spam and Phishing Detection
