An Efficient Approach for Statistical Matching of Survey Data Trough Calibration, Optimal Transport and Balanced Sampling
Rapha\"el Jauslin, Yves Till\'e

TL;DR
This paper introduces an efficient statistical matching method that combines survey data using calibration, optimal transport, and balanced sampling to improve data integration and analysis.
Contribution
It presents a novel approach that effectively matches two samples with weighting schemes, enhancing data integration methods.
Findings
Improved matching accuracy over existing methods
Flexible variants for different data integration scenarios
Enhanced usability of combined survey datasets
Abstract
Statistical matching aims to integrate two statistical sources. These sources can be two samples or a sample and the entire population. If two samples have been selected from the same population and information has been collected on different variables of interest, then it is interesting to match the two surveys to analyse, for example, contingency tables or covariances. In this paper, we propose an efficient method for matching two samples that may each contain a weighting scheme. The method matches the records of the two sources. Several variants are proposed in order to create a directly usable file integrating data from both information sources.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Data Quality and Management · Bayesian Modeling and Causal Inference
