Machine Learning Models that Remember Too Much

Congzheng Song; Thomas Ristenpart; Vitaly Shmatikov

arXiv:1709.07886·cs.CR·September 28, 2017·31 cites

Machine Learning Models that Remember Too Much

Congzheng Song, Thomas Ristenpart, Vitaly Shmatikov

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates how machine learning models can be intentionally designed to memorize sensitive training data without sacrificing predictive accuracy, and shows how such memorization can be exploited to extract private information.

Contribution

The authors introduce practical algorithms that enable models to memorize training data while maintaining accuracy, highlighting privacy risks in ML models accessible to malicious providers.

Findings

01

Models can memorize training data with high accuracy.

02

Memorized information can be accurately extracted from models.

03

Techniques are effective across image, face recognition, and text tasks.

Abstract

Machine learning (ML) is becoming a commodity. Numerous ML frameworks and services are available to data holders who are not ML experts but want to train predictive models on their data. It is important that ML models trained on sensitive inputs (e.g., personal images or documents) not leak too much information about the training data. We consider a malicious ML provider who supplies model-training code to the data holder, does not observe the training, but then obtains white- or black-box access to the resulting model. In this setting, we design and implement practical algorithms, some of them very similar to standard ML techniques such as regularization and data augmentation, that "memorize" information about the training dataset in the model yet the model is as accurate and predictive as a conventionally trained model. We then explain how the adversary can extract memorized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

csong27/ml-model-remember
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Privacy-Preserving Technologies in Data