Statistical Privacy Guarantees of Machine Learning Preprocessing   Techniques

Ashly Lau; Jonathan Passerat-Palmbach

arXiv:2109.02496·cs.LG·September 7, 2021

Statistical Privacy Guarantees of Machine Learning Preprocessing Techniques

Ashly Lau, Jonathan Passerat-Palmbach

PDF

Open Access

TL;DR

This paper introduces a statistical framework to empirically measure privacy leakage in machine learning preprocessing, revealing that resampling techniques for imbalanced datasets can compromise privacy, thus emphasizing the need for private preprocessing methods.

Contribution

It adapts a statistical privacy violation detection framework for ML pipelines and demonstrates privacy leaks caused by common resampling techniques.

Findings

01

Resampling techniques increase privacy leakage in models.

02

The framework effectively measures privacy levels in preprocessing.

03

Highlighting the need for private preprocessing methods.

Abstract

Differential privacy provides strong privacy guarantees for machine learning applications. Much recent work has been focused on developing differentially private models, however there has been a gap in other stages of the machine learning pipeline, in particular during the preprocessing phase. Our contributions are twofold: we adapt a privacy violation detection framework based on statistical methods to empirically measure privacy levels of machine learning pipelines, and apply the newly created framework to show that resampling techniques used when dealing with imbalanced datasets cause the resultant model to leak more privacy. These results highlight the need for developing private preprocessing techniques.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning · Ethics and Social Impacts of AI