Differentially Private Linear Regression with Linked Data
Shurong Lin, Elliot Paquette, Eric D. Kolaczyk

TL;DR
This paper introduces two differentially private algorithms for linear regression on linked data, addressing the additional uncertainty from record linkage and analyzing the privacy-accuracy tradeoff with finite-sample error bounds.
Contribution
It presents novel privacy-preserving algorithms specifically designed for linked data in linear regression, incorporating linkage uncertainty into the privacy framework.
Findings
Finite-sample error bounds for estimators
Analysis of linkage error impact on accuracy
Demonstrated algorithms via simulations and synthetic data
Abstract
There has been increasing demand for establishing privacy-preserving methodologies for modern statistics and machine learning. Differential privacy, a mathematical notion from computer science, is a rising tool offering robust privacy guarantees. Recent work focuses primarily on developing differentially private versions of individual statistical and machine learning tasks, with nontrivial upstream pre-processing typically not incorporated. An important example is when record linkage is done prior to downstream modeling. Record linkage refers to the statistical task of linking two or more data sets of the same group of entities without a unique identifier. This probabilistic procedure brings additional uncertainty to the subsequent task. In this paper, we present two differentially private algorithms for linear regression with linked data. In particular, we propose a noisy gradient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Data Quality and Management · Statistical Methods and Bayesian Inference
