Leakage of Dataset Properties in Multi-Party Machine Learning

Wanrong Zhang; Shruti Tople; Olga Ohrimenko

arXiv:2006.07267·cs.LG·June 21, 2021·28 cites

Leakage of Dataset Properties in Multi-Party Machine Learning

Wanrong Zhang, Shruti Tople, Olga Ohrimenko

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that multi-party machine learning can unintentionally leak global dataset properties, such as attribute distributions, even when only black-box access to the final model is provided, raising privacy concerns.

Contribution

It reveals how dataset property leakage occurs in multi-party ML and analyzes factors influencing this leakage across various data types and models.

Findings

01

Leakage of population-level properties is possible even without direct attribute inclusion.

02

Leakage persists despite low correlation between sensitive attributes and other data.

03

Various data types, including tabular, text, and graph data, are vulnerable to this leakage.

Abstract

Secure multi-party machine learning allows several parties to build a model on their pooled data to increase utility while not explicitly sharing data with each other. We show that such multi-party computation can cause leakage of global dataset properties between the parties even when parties obtain only black-box access to the final model. In particular, a ``curious'' party can infer the distribution of sensitive attributes in other parties' data with high accuracy. This raises concerns regarding the confidentiality of properties pertaining to the whole dataset as opposed to individual data records. We show that our attack can leak population-level properties in datasets of different types, including tabular, text, and graph data. To understand and measure the source of leakage, we consider several models of correlation between a sensitive attribute and the rest of the data. Using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

epfl-dlab/property-inference-attacks
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning · Data Quality and Management