# Data Masking with Privacy Guarantees

**Authors:** Anh T. Pham, Shalini Ghosh, Vinod Yegneswaran

arXiv: 1901.02185 · 2019-01-09

## TL;DR

This paper introduces a data masking technique that provides privacy guarantees while preserving data utility for training classifiers, outperforming traditional methods especially with large datasets.

## Contribution

The paper proposes a novel data masking method that maintains classifier performance under privacy constraints, with theoretical analysis and empirical validation.

## Key findings

- Lower risk compared to input perturbation
- Effective on 12 benchmark datasets
- Better scalability with larger training samples

## Abstract

We study the problem of data release with privacy, where data is made available with privacy guarantees while keeping the usability of the data as high as possible --- this is important in health-care and other domains with sensitive data. In particular, we propose a method of masking the private data with privacy guarantee while ensuring that a classifier trained on the masked data is similar to the classifier trained on the original data, to maintain usability. We analyze the theoretical risks of the proposed method and the traditional input perturbation method. Results show that the proposed method achieves lower risk compared to the input perturbation, especially when the number of training samples gets large. We illustrate the effectiveness of the proposed method of data masking for privacy-sensitive learning on $12$ benchmark datasets.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1901.02185/full.md

## Figures

32 figures with captions in the complete paper: https://tomesphere.com/paper/1901.02185/full.md

## References

19 references — full list in the complete paper: https://tomesphere.com/paper/1901.02185/full.md

---
Source: https://tomesphere.com/paper/1901.02185