Uncovering the Sociodemographic Fabric of Reddit
Federico Cinus, Corrado Monti, Paolo Bajardi, Gianmarco De Francisci Morales

TL;DR
This paper presents a transparent, reliable framework for inferring sociodemographic traits of Reddit users using self-declared data, outperforming complex models and emphasizing ethical considerations in computational social science.
Contribution
Introduces a principled, interpretable sociodemographic inference method leveraging self-declared user data, outperforming existing models and promoting ethical research practices.
Findings
Naive Bayes outperforms complex embedding models by up to 19% in ROC AUC.
Achieves less than 15% quantification error.
Provides well-calibrated, interpretable outputs for sociodemographic analysis.
Abstract
Understanding the sociodemographic composition of online platforms is essential for accurately interpreting digital behavior and its societal implications. Yet, current methods often lack the transparency and reliability required, risking misrepresenting social identities and distorting our understanding of digital society. Here, we introduce a principled framework for sociodemographic inference on Reddit that leverages over 850,000 user self-declarations of age, gender, and partisan affiliation. By training models on sparse user activity signals from this extensive, self-disclosed dataset, we demonstrate that simple probabilistic models, such as Naive Bayes, outperform more complex embedding-based alternatives. Our approach improves classification performance over the state of the art by up to 19% in ROC AUC and maintains quantification error below 15%. The models produce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSocial Media and Politics · Digital Marketing and Social Media · Media Influence and Politics
