Conditional Masking to Numerical Data
Debolina Ghatak, Bimak K Roy

TL;DR
This paper introduces a hybrid data obfuscation method for numerical datasets that balances privacy with utility, providing unbiased distribution estimates and reliable statistical measures.
Contribution
A novel mixed obfuscation technique combining data swapping and noise addition that preserves distribution shape and statistical accuracy.
Findings
Unbiased estimation of the original data distribution achieved.
Reliable estimates of moments and correlation maintained.
Enhanced privacy protection with minimal utility loss.
Abstract
Protecting the privacy of data-sets has become hugely important these days. Many real-life data-sets like income data, medical data need to be secured before making it public. However, security comes at the cost of losing some useful statistical information about the data-set. Data obfuscation deals with this problem of masking a data-set in such a way that the utility of the data is maximized while minimizing the risk of the disclosure of sensitive information. Two popular approaches to data obfuscation for numerical data involves (i) data swapping and (ii) adding noise to data. While the former masks well sacrificing the whole of correlation information, the latter gives estimates for most of the popular statistics like mean, variance, quantiles, correlation but fails to give an unbiased estimate of the distribution curve of the original data. In this paper, we propose a mixed method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Chaos-based Image/Signal Encryption
