Latent demographic profile estimation in hard-to-reach groups
Tyler H. McCormick, Tian Zheng

TL;DR
This paper introduces a Bayesian hierarchical model leveraging social network data to estimate demographic profiles of hard-to-reach groups without special sampling, aiding social science research in inaccessible populations.
Contribution
It presents a novel statistical approach using Aggregated Relational Data and Bayesian modeling to estimate demographic profiles of hidden populations.
Findings
Successfully estimated age and gender profiles of six hard-to-reach groups.
Demonstrated effectiveness of the model with real and simulated data.
Provided practical guidelines for data collection and estimation.
Abstract
The sampling frame in most social science surveys excludes members of certain groups, known as hard-to-reach groups. These groups, or subpopulations, may be difficult to access (the homeless, e.g.), camouflaged by stigma (individuals with HIV/AIDS), or both (commercial sex workers). Even basic demographic information about these groups is typically unknown, especially in many developing nations. We present statistical models which leverage social network structure to estimate demographic characteristics of these subpopulations using Aggregated relational data (ARD), or questions of the form "How many X's do you know?" Unlike other network-based techniques for reaching these groups, ARD require no special sampling strategy and are easily incorporated into standard surveys. ARD also do not require respondents to reveal their own group membership. We propose a Bayesian hierarchical model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
