TL;DR
This paper introduces a differentially private kernel two-sample test that maintains high utility and privacy by approximating complex test statistics with finite-dimensional representations and perturbing them for privacy guarantees.
Contribution
It proposes a novel framework for differentially private kernel two-sample testing using finite-dimensional approximations and a simple chi-squared test, addressing privacy concerns in sensitive data analysis.
Findings
Requires only a modest increase in sample size for comparable power to non-private tests.
Effective in two realistic data settings, maintaining utility under privacy constraints.
Provides a practical approach to privacy-preserving statistical testing without heavily compromising accuracy.
Abstract
Kernel two-sample testing is a useful statistical tool in determining whether data samples arise from different distributions without imposing any parametric assumptions on those distributions. However, raw data samples can expose sensitive information about individuals who participate in scientific studies, which makes the current tests vulnerable to privacy breaches. Hence, we design a new framework for kernel two-sample testing conforming to differential privacy constraints, in order to guarantee the privacy of subjects in the data. Unlike existing differentially private parametric tests that simply add noise to data, kernel-based testing imposes a challenge due to a complex dependence of test statistics on the raw data, as these statistics correspond to estimators of distances between representations of probability measures in Hilbert spaces. Our approach considers finite…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
