Differentially Private Ordinary Least Squares

Or Sheffet

arXiv:1507.02482·cs.DS·August 23, 2017

Differentially Private Ordinary Least Squares

Or Sheffet

PDF

TL;DR

This paper develops differentially private methods for linear regression analysis, enabling the construction of confidence intervals and hypothesis testing while preserving data privacy, using techniques like Johnson-Lindenstrauss Transform and Ridge regression.

Contribution

It introduces differentially private confidence intervals for linear regression using JLT and analyzes their effectiveness for well-spread data and regularized models.

Findings

01

JLT provides accurate approximation of t-values for well-spread data

02

Confidence intervals can be derived from projected data in Ridge regression

03

Analyze Gauss algorithm can produce valid confidence intervals under certain conditions

Abstract

Linear regression is one of the most prevalent techniques in machine learning, however, it is also common to use linear regression for its \emph{explanatory} capabilities rather than label prediction. Ordinary Least Squares (OLS) is often used in statistics to establish a correlation between an attribute (e.g. gender) and a label (e.g. income) in the presence of other (potentially correlated) features. OLS assumes a particular model that randomly generates the data, and derives \emph{ $t$ -values} --- representing the likelihood of each real value to be the true correlation. Using $t$ -values, OLS can release a \emph{confidence interval}, which is an interval on the reals that is likely to contain the true correlation, and when this interval does not intersect the origin, we can \emph{reject the null hypothesis} as it is likely that the true correlation is non-zero. Our work aims at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Regression