Two-Phase Data Synthesis for Income: An Application to the NHIS
Kevin Ros, Henrik Olsson, Jingchen Hu

TL;DR
This paper introduces a two-phase Bayesian data synthesis method for generating synthetic income data, effectively handling skewness and zeros, and demonstrates its utility using NHIS data.
Contribution
The paper presents a novel two-phase synthesis approach for income data, improving upon single-phase methods by better modeling skewness and zero-inflation.
Findings
Two-phase synthesis outperforms single-phase in utility and risk profiles.
Bayesian models effectively handle skewed and zero-inflated income data.
Application to NHIS data demonstrates practical utility.
Abstract
We propose a two-phase synthesis process for synthesizing income, a sensitive variable which is usually highly-skewed and has a number of reported zeros. We consider two forms of a continuous income variable: a binary form, which is modeled and synthesized in phase 1; and a non-negative continuous form, which is modeled and synthesized in phase 2. Bayesian synthesis models are proposed for the two-phase synthesis process, and other synthesis models are readily implementable. We demonstrate our methods with applications to a sample from the National Health Interview Survey (NHIS). Utility and risk profiles of generated synthetic datasets are evaluated and compared to results from a single-phase synthesis process.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Healthcare Policy and Management · Economic and Environmental Valuation
