Combining Data from Surveys and Related Sources
Dexter Cahoy, Joseph Sedransk

TL;DR
This paper develops advanced methods for combining survey and administrative data to improve inference accuracy, focusing on uncertain pooling and Bayesian nonparametric approaches, with analytical and numerical validation.
Contribution
It introduces a general methodology for data integration from multiple sources, including uncertain pooling and Dirichlet process mixtures, enhancing inference quality over existing methods.
Findings
Uncertain pooling effectively combines data sources with clustering.
Dirichlet process mixtures provide flexible nonparametric modeling.
Methodological properties are validated through analytical and numerical analysis.
Abstract
To improve the precision of inferences and reduce costs there is considerable interest in combining data from several sources such as sample surveys and administrative data. Appropriate methodology is required to ensure satisfactory inferences since the target populations and methods for acquiring data may be quite different. To provide improved inferences we use methodology that has a more general structure than the ones in current practice. We start with the case where the analyst has only summary statistics from each of the sources. In our primary method, uncertain pooling, it is assumed that the analyst can regard one source, survey , as the single best choice for inference. This method starts with the data from survey and adds data from those other sources that are shown to form clusters that include survey . We also consider Dirichlet process mixtures, one of the most…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Census and Population Estimation · Statistical Methods and Bayesian Inference
