The Secret Sharer: Evaluating and Testing Unintended Memorization in   Neural Networks

Nicholas Carlini; Chang Liu; \'Ulfar Erlingsson; Jernej Kos; and Dawn Song

arXiv:1802.08232·cs.LG·July 17, 2019·505 cites

The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks

Nicholas Carlini, Chang Liu, \'Ulfar Erlingsson, Jernej Kos, and Dawn Song

PDF

Open Access

TL;DR

This paper introduces a methodology to evaluate and test the unintended memorization of rare or unique data in neural network models, highlighting privacy risks and providing practical tools to mitigate data exposure.

Contribution

It presents a new testing approach for quantifying memorization in generative models and demonstrates its effectiveness in real-world applications like Google's Smart Compose.

Findings

01

Unintended memorization is persistent and hard to avoid.

02

The methodology can extract sensitive sequences like credit card numbers.

03

Practical testing can help limit data exposure in commercial models.

Abstract

This paper describes a testing methodology for quantitatively assessing the risk that rare or unique training-data sequences are unintentionally memorized by generative sequence models---a common type of machine-learning model. Because such models are sometimes trained on sensitive data (e.g., the text of users' private messages), this methodology can benefit privacy by allowing deep-learning practitioners to select means of training that minimize such memorization. In experiments, we show that unintended memorization is a persistent, hard-to-avoid issue that can have serious consequences. Specifically, for models trained without consideration of memorization, we describe new, efficient procedures that can extract unique, secret sequences, such as credit card numbers. We show that our testing strategy is a practical and easy-to-use first line of defense, e.g., by describing its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Internet Traffic Analysis and Secure E-voting · Digital and Cyber Forensics

MethodsApproximate Bayesian Computation