Data Fusion for Joining Income and Consumption Information Using   Different Donor-Recipient Distance Metrics

Florian Meinfelder; Jannik Schaller

arXiv:2012.00081·stat.ME·December 2, 2020

Data Fusion for Joining Income and Consumption Information Using Different Donor-Recipient Distance Metrics

Florian Meinfelder, Jannik Schaller

PDF

Open Access

TL;DR

This paper compares two nearest neighbour matching methods for data fusion of income and consumption data, finding that Predictive Mean Matching generally performs better than Random Hot Deck in simulation studies.

Contribution

It introduces a comparison between Random Hot Deck and Predictive Mean Matching for data fusion, highlighting the advantages of the latter in joint income and consumption analysis.

Findings

01

Predictive Mean Matching outperforms Random Hot Deck in simulations.

02

Matching method choice significantly affects data fusion results.

03

Predictive Mean Matching provides more accurate joint data estimates.

Abstract

Data fusion describes the method of combining data from (at least) two initially independent data sources to allow for joint analysis of variables which are not jointly observed. The fundamental idea is to base inference on identifying assumptions, and on common variables which provide information that is jointly observed in all the data sources. A popular class of methods dealing with this particular missing-data problem is based on nearest neighbour matching. However, exact matches become unlikely with increasing common information, and the specification of the distance function can influence the results of the data fusion. In this paper we compare two different approaches of nearest neighbour hot deck matching: One, Random Hot Deck, is a variant of the covariate-based matching methods which was proposed by Eurostat, and can be considered as a 'classical' statistical matching method,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference