Data-Driven Strategies for Detecting and Sampling Misrepresented Subgroups
G. Lancia, F.Mecatti, E. Riccomagno

TL;DR
This paper introduces a data-driven method using unsupervised learning to identify and improve the sampling of under-represented subgroups in survey data, enhancing socio-demographic analysis.
Contribution
It presents a novel approach combining univariate and multivariate unsupervised learning to detect and characterize under-represented groups in survey datasets.
Findings
Key indicators of under-representation include citizenship and economic vulnerability.
The method successfully identifies rare subgroups in EU-SILC data.
Improves survey inclusiveness for better policy insights.
Abstract
Economic policy research frequently examines population well-being, with a particular focus on the relationships between unequal living conditions, low educational attainment, and social exclusion. Sample surveys, such as EU-SILC, are widely used for this purpose and inform public policy; yet, their sampling designs may fail to adequately represent rare, hard-to-sample, or under-covered subgroups. This limitation can hinder socio-demographic analyses and evidence-based policy design. We propose a generalisable approach based on univariate and multivariate unsupervised learning techniques to detect outliers in survey data that may signal under-represented subgroups. Identified groups can then be characterised to inform targeted resampling strategies that improve survey inclusiveness. An empirical application using the 2019 EU-SILC data for the Italian region of Liguria shows that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCensus and Population Estimation · Insurance, Mortality, Demography, Risk Management · Data-Driven Disease Surveillance
