Survey Data Integration for Distribution Function Estimation
Jeremy Flood, Sayed Mostafa

TL;DR
This paper introduces a novel residual-based method for integrating probability and nonprobability survey data to accurately estimate cumulative distribution functions and quantiles, with proven theoretical properties and demonstrated empirical performance.
Contribution
It proposes a semiparametric residual-based CDF estimator that effectively combines datasets, providing a new approach for survey data integration with established asymptotic properties.
Findings
The estimator performs favorably compared to existing methods.
Theoretical analysis confirms bias and variance properties.
Empirical results validate the estimator's effectiveness.
Abstract
Estimates of finite population cumulativedistribution functions (CDFs) and quantiles are critical forpolicy-making, resource allocation, and public health planning. For instance, federal finance agencies may require accurate estimates of the proportion of individuals with income below the federal poverty line to determine funding eligibility, while health organizations may rely on precise quantile estimates of key health variables to guide local health interventions. Despite growing interest in survey data integration, research on the integration of probability and nonprobability samples toestimate CDFs and quantiles remains limited. In this study, we propose a novel residual-based CDF estimator that integrates information from a probability sample with data from potentially large nonprobability samples. Our approach leverages shared covariates observed in both datasets, while the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models
