Private Regression via Data-Dependent Sufficient Statistic Perturbation
Cecilia Ferrando, Daniel Sheldon

TL;DR
This paper introduces a data-dependent approach to sufficient statistic perturbation for differentially private linear and logistic regression, improving accuracy by leveraging data-dependent mechanisms and connecting to synthetic data generation.
Contribution
It develops a novel data-dependent SSP method for linear and logistic regression, outperforming existing data-independent approaches and linking synthetic data to privacy mechanisms.
Findings
Data-dependent SSP outperforms state-of-the-art data-independent SSP.
The approach extends to logistic regression with competitive results.
Training on synthetic data aligns with data-dependent SSP for models with sufficient statistics.
Abstract
Sufficient statistic perturbation (SSP) is a widely used method for differentially private linear regression. SSP adopts a data-independent approach where privacy noise from a simple distribution is added to sufficient statistics. However, sufficient statistics can often be expressed as linear queries and better approximated by data-dependent mechanisms. In this paper we introduce data-dependent SSP for linear regression based on post-processing privately released marginals, and find that it outperforms state-of-the-art data-independent SSP. We extend this result to logistic regression by developing an approximate objective that can be expressed in terms of sufficient statistics, resulting in a novel and highly competitive SSP approach for logistic regression. We also make a connection to synthetic data for machine learning: for models with sufficient statistics, training on synthetic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Bayesian Inference · Economic and Environmental Valuation · Census and Population Estimation
MethodsLogistic Regression · Linear Regression
