Differentially Private Linear Regression with Linked Data

Shurong Lin; Elliot Paquette; Eric D. Kolaczyk

arXiv:2308.00836·stat.ME·May 9, 2024

Differentially Private Linear Regression with Linked Data

Shurong Lin, Elliot Paquette, Eric D. Kolaczyk

PDF

Open Access

TL;DR

This paper introduces two differentially private algorithms for linear regression on linked data, addressing the additional uncertainty from record linkage and analyzing the privacy-accuracy tradeoff with finite-sample error bounds.

Contribution

It presents novel privacy-preserving algorithms specifically designed for linked data in linear regression, incorporating linkage uncertainty into the privacy framework.

Findings

01

Finite-sample error bounds for estimators

02

Analysis of linkage error impact on accuracy

03

Demonstrated algorithms via simulations and synthetic data

Abstract

There has been increasing demand for establishing privacy-preserving methodologies for modern statistics and machine learning. Differential privacy, a mathematical notion from computer science, is a rising tool offering robust privacy guarantees. Recent work focuses primarily on developing differentially private versions of individual statistical and machine learning tasks, with nontrivial upstream pre-processing typically not incorporated. An important example is when record linkage is done prior to downstream modeling. Record linkage refers to the statistical task of linking two or more data sets of the same group of entities without a unique identifier. This probabilistic procedure brings additional uncertainty to the subsequent task. In this paper, we present two differentially private algorithms for linear regression with linked data. In particular, we propose a noisy gradient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Data Quality and Management · Statistical Methods and Bayesian Inference