Private Data Imputation

Abdelkarim Kati; Florian Kerschbaum; Marina Blanton

arXiv:2511.20832·cs.CR·November 27, 2025

Private Data Imputation

Abdelkarim Kati, Florian Kerschbaum, Marina Blanton

PDF

Open Access

TL;DR

This paper introduces optimized protocols for private data imputation that preserve privacy and significantly improve accuracy over local imputation, especially for distributed datasets.

Contribution

It presents the first efficient protocols for private data imputation applicable to horizontally and vertically split datasets, reducing computation to private set intersection.

Findings

01

20% accuracy improvement for vertically split data

02

5% accuracy improvement for horizontally split data

03

Up to 32.7 times better imputation quality in worst-case scenarios

Abstract

Data imputation is an important data preparation task where the data analyst replaces missing or erroneous values to increase the expected accuracy of downstream analyses. The accuracy improvement of data imputation extends to private data analyses across distributed databases. However, existing data imputation methods violate the privacy of the data rendering the privacy protection in the downstream analyses obsolete. We conclude that private data analysis requires private data imputation. In this paper, we present the first optimized protocols for private data imputation. We consider the case of horizontally and vertically split data sets. Our optimization aims to reduce most of the computation to private set intersection (or at least oblivious programmable pseudo-random function) protocols which can be very efficiently computed. We show that private data imputation has -- on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Data Quality and Management