Applying Data Synthesis for Longitudinal Business Data across Three   Countries

M. Jahangir Alam; Benoit Dostie; J\"org Drechsler; Lars; Vilhuber

arXiv:2008.02246·econ.EM·November 13, 2020

Applying Data Synthesis for Longitudinal Business Data across Three Countries

M. Jahangir Alam, Benoit Dostie, J\"org Drechsler, Lars, Vilhuber

PDF

1 Repo

TL;DR

This paper explores creating secure, synthetic microdata for business statistics across Canada and Germany, aiming to balance data utility with confidentiality, and assesses the feasibility of extending this approach internationally.

Contribution

It demonstrates the application of a previously used synthetic data generation model to new countries and evaluates its utility, protection, and scalability.

Findings

01

Synthetic data maintains analytical validity.

02

Protection against identification is enhanced.

03

Method is feasible and cost-effective for multiple countries.

Abstract

Data on businesses collected by statistical agencies are challenging to protect. Many businesses have unique characteristics, and distributions of employment, sales, and profits are highly skewed. Attackers wishing to conduct identification attacks often have access to much more information than for any individual. As a consequence, most disclosure avoidance mechanisms fail to strike an acceptable balance between usefulness and confidentiality protection. Detailed aggregate statistics by geography or detailed industry classes are rare, public-use microdata on businesses are virtually inexistant, and access to confidential microdata can be burdensome. Synthetic microdata have been proposed as a secure mechanism to publish microdata, as part of a broader discussion of how to provide broader access to such data sets to researchers. In this article, we document an experiment to create…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

labordynamicsinstitute/SyntheticLEAP
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.