Data Fusion for Joining Income and Consumption Information Using Different Donor-Recipient Distance Metrics
Florian Meinfelder, Jannik Schaller

TL;DR
This paper compares two nearest neighbour matching methods for data fusion of income and consumption data, finding that Predictive Mean Matching generally performs better than Random Hot Deck in simulation studies.
Contribution
It introduces a comparison between Random Hot Deck and Predictive Mean Matching for data fusion, highlighting the advantages of the latter in joint income and consumption analysis.
Findings
Predictive Mean Matching outperforms Random Hot Deck in simulations.
Matching method choice significantly affects data fusion results.
Predictive Mean Matching provides more accurate joint data estimates.
Abstract
Data fusion describes the method of combining data from (at least) two initially independent data sources to allow for joint analysis of variables which are not jointly observed. The fundamental idea is to base inference on identifying assumptions, and on common variables which provide information that is jointly observed in all the data sources. A popular class of methods dealing with this particular missing-data problem is based on nearest neighbour matching. However, exact matches become unlikely with increasing common information, and the specification of the distance function can influence the results of the data fusion. In this paper we compare two different approaches of nearest neighbour hot deck matching: One, Random Hot Deck, is a variant of the covariate-based matching methods which was proposed by Eurostat, and can be considered as a 'classical' statistical matching method,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference
