Machine Learning Models that Remember Too Much
Congzheng Song, Thomas Ristenpart, Vitaly Shmatikov

TL;DR
This paper demonstrates how machine learning models can be intentionally designed to memorize sensitive training data without sacrificing predictive accuracy, and shows how such memorization can be exploited to extract private information.
Contribution
The authors introduce practical algorithms that enable models to memorize training data while maintaining accuracy, highlighting privacy risks in ML models accessible to malicious providers.
Findings
Models can memorize training data with high accuracy.
Memorized information can be accurately extracted from models.
Techniques are effective across image, face recognition, and text tasks.
Abstract
Machine learning (ML) is becoming a commodity. Numerous ML frameworks and services are available to data holders who are not ML experts but want to train predictive models on their data. It is important that ML models trained on sensitive inputs (e.g., personal images or documents) not leak too much information about the training data. We consider a malicious ML provider who supplies model-training code to the data holder, does not observe the training, but then obtains white- or black-box access to the resulting model. In this setting, we design and implement practical algorithms, some of them very similar to standard ML techniques such as regularization and data augmentation, that "memorize" information about the training dataset in the model yet the model is as accurate and predictive as a conventionally trained model. We then explain how the adversary can extract memorized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Privacy-Preserving Technologies in Data
