A Differentially Private Kernel Two-Sample Test

Anant Raj; Ho Chung Leon Law; Dino Sejdinovic; Mijung Park

arXiv:1808.00380·stat.ML·August 2, 2018

A Differentially Private Kernel Two-Sample Test

Anant Raj, Ho Chung Leon Law, Dino Sejdinovic, Mijung Park

PDF

2 Repos

TL;DR

This paper introduces a differentially private kernel two-sample test that maintains high utility and privacy by approximating complex test statistics with finite-dimensional representations and perturbing them for privacy guarantees.

Contribution

It proposes a novel framework for differentially private kernel two-sample testing using finite-dimensional approximations and a simple chi-squared test, addressing privacy concerns in sensitive data analysis.

Findings

01

Requires only a modest increase in sample size for comparable power to non-private tests.

02

Effective in two realistic data settings, maintaining utility under privacy constraints.

03

Provides a practical approach to privacy-preserving statistical testing without heavily compromising accuracy.

Abstract

Kernel two-sample testing is a useful statistical tool in determining whether data samples arise from different distributions without imposing any parametric assumptions on those distributions. However, raw data samples can expose sensitive information about individuals who participate in scientific studies, which makes the current tests vulnerable to privacy breaches. Hence, we design a new framework for kernel two-sample testing conforming to differential privacy constraints, in order to guarantee the privacy of subjects in the data. Unlike existing differentially private parametric tests that simply add noise to data, kernel-based testing imposes a challenge due to a complex dependence of test statistics on the raw data, as these statistics correspond to estimators of distances between representations of probability measures in Hilbert spaces. Our approach considers finite…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.