Training Data Reconstruction: Privacy due to Uncertainty?

Christina Runkel; Kanchana Vaishnavi Gandikota; Jonas Geiping,; Carola-Bibiane Sch\"onlieb; Michael Moeller

arXiv:2412.08544·cs.LG·December 12, 2024

Training Data Reconstruction: Privacy due to Uncertainty?

Christina Runkel, Kanchana Vaishnavi Gandikota, Jonas Geiping,, Carola-Bibiane Sch\"onlieb, Michael Moeller

PDF

Open Access

TL;DR

This paper investigates the privacy risks of training data reconstruction from neural network parameters, revealing that initialisation heavily influences reconstruction quality and that reconstructed images may not be part of the original dataset, raising privacy concerns.

Contribution

It introduces a new bilevel optimization formulation for training data reconstruction and empirically analyzes how initialisation affects reconstruction accuracy and privacy implications.

Findings

01

Reconstruction quality depends heavily on initialisation.

02

Random initialisation can produce plausible but non-training images.

03

Reconstructed images may not be identifiable as training data.

Abstract

Being able to reconstruct training data from the parameters of a neural network is a major privacy concern. Previous works have shown that reconstructing training data, under certain circumstances, is possible. In this work, we analyse such reconstructions empirically and propose a new formulation of the reconstruction as a solution to a bilevel optimisation problem. We demonstrate that our formulation as well as previous approaches highly depend on the initialisation of the training images $x$ to reconstruct. In particular, we show that a random initialisation of $x$ can lead to reconstructions that resemble valid training samples while not being part of the actual training dataset. Thus, our experiments on affine and one-hidden layer networks suggest that when reconstructing natural images, yet an adversary cannot identify whether reconstructed images have indeed been part of the set…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data

MethodsSparse Evolutionary Training