A smoothing approach for masking spatial data
Yijie Zhou, Francesca Dominici, Thomas A. Louis

TL;DR
This paper introduces a flexible spatial smoothing-based data masking method for health data that balances privacy with statistical utility, demonstrated through a Medicare mortality study.
Contribution
It proposes a novel spatial smoothing approach for data masking that allows adjustable privacy-utility trade-offs and incorporates prior spatial knowledge to improve analysis accuracy.
Findings
The method effectively reduces disclosure risk while maintaining low MSE in regression estimates.
Incorporating prior spatial knowledge decreases bias and MSE.
Application to Medicare data illustrates practical utility and risk assessment.
Abstract
Individual-level health data are often not publicly available due to confidentiality; masked data are released instead. Therefore, it is important to evaluate the utility of using the masked data in statistical analyses such as regression. In this paper we propose a data masking method which is based on spatial smoothing techniques. The proposed method allows for selecting both the form and the degree of masking, thus resulting in a large degree of flexibility. We investigate the utility of the masked data sets in terms of the mean square error (MSE) of regression parameter estimates when fitting a Generalized Linear Model (GLM) to the masked data. We also show that incorporating prior knowledge on the spatial pattern of the exposure into the data masking may reduce the bias and MSE of the parameter estimates. By evaluating both utility and disclosure risk as functions of the form and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
