Private Data Imputation
Abdelkarim Kati, Florian Kerschbaum, Marina Blanton

TL;DR
This paper introduces optimized protocols for private data imputation that preserve privacy and significantly improve accuracy over local imputation, especially for distributed datasets.
Contribution
It presents the first efficient protocols for private data imputation applicable to horizontally and vertically split datasets, reducing computation to private set intersection.
Findings
20% accuracy improvement for vertically split data
5% accuracy improvement for horizontally split data
Up to 32.7 times better imputation quality in worst-case scenarios
Abstract
Data imputation is an important data preparation task where the data analyst replaces missing or erroneous values to increase the expected accuracy of downstream analyses. The accuracy improvement of data imputation extends to private data analyses across distributed databases. However, existing data imputation methods violate the privacy of the data rendering the privacy protection in the downstream analyses obsolete. We conclude that private data analysis requires private data imputation. In this paper, we present the first optimized protocols for private data imputation. We consider the case of horizontally and vertically split data sets. Our optimization aims to reduce most of the computation to private set intersection (or at least oblivious programmable pseudo-random function) protocols which can be very efficiently computed. We show that private data imputation has -- on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Data Quality and Management
