On InstaHide, Phase Retrieval, and Sparse Matrix Factorization
Sitan Chen, Xiaoxiao Li, Zhao Song, Danyang Zhuo

TL;DR
This paper analyzes the security of the InstaHide scheme for private data sharing in distributed learning, revealing its connection to phase retrieval problems and proposing an algorithm for private vector recovery under certain assumptions.
Contribution
It establishes a novel link between InstaHide security and phase retrieval complexity, and introduces an algorithm for recovering private vectors assuming Gaussian data.
Findings
InstaHide security is related to the complexity of a new phase retrieval problem.
A provable algorithm can recover private vectors from public and synthetic data.
Recovery is feasible under isotropic Gaussian assumptions.
Abstract
In this work, we examine the security of InstaHide, a scheme recently proposed by [Huang, Song, Li and Arora, ICML'20] for preserving the security of private datasets in the context of distributed learning. To generate a synthetic training example to be shared among the distributed learners, InstaHide takes a convex combination of private feature vectors and randomly flips the sign of each entry of the resulting vector with probability 1/2. A salient question is whether this scheme is secure in any provable sense, perhaps under a plausible hardness assumption and assuming the distributions generating the public and private data satisfy certain properties. We show that the answer to this appears to be quite subtle and closely related to the average-case complexity of a new multi-task, missing-data version of the classic problem of phase retrieval. Motivated by this connection, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Forensic and Genetic Research
