# Impact of Prior Knowledge and Data Correlation on Privacy Leakage: A   Unified Analysis

**Authors:** Yanan Li, Xuebin Ren, Shusen Yang, and Xinyu Yang

arXiv: 1906.02606 · 2019-06-07

## TL;DR

This paper introduces a unified framework analyzing how data correlation, prior knowledge, and query sensitivity influence privacy leakage, proposing a new privacy measure called prior differential privacy (PDP) and deriving mathematical expressions for various data models.

## Contribution

It proposes the prior differential privacy (PDP) definition and provides a unified analysis of privacy leakage considering data correlation, prior knowledge, and query sensitivity.

## Key findings

- Positive, negative, and hybrid correlations affect privacy leakage differently.
- Closed-form expression of privacy leakage for continuous data is derived.
- The analysis applies to general linear queries like count, sum, mean, and histogram.

## Abstract

It has been widely understood that differential privacy (DP) can guarantee rigorous privacy against adversaries with arbitrary prior knowledge. However, recent studies demonstrate that this may not be true for correlated data, and indicate that three factors could influence privacy leakage: the data correlation pattern, prior knowledge of adversaries, and sensitivity of the query function. This poses a fundamental problem: what is the mathematical relationship between the three factors and privacy leakage? In this paper, we present a unified analysis of this problem. A new privacy definition, named \textit{prior differential privacy (PDP)}, is proposed to evaluate privacy leakage considering the exact prior knowledge possessed by the adversary. We use two models, the weighted hierarchical graph (WHG) and the multivariate Gaussian model to analyze discrete and continuous data, respectively. We demonstrate that positive, negative, and hybrid correlations have distinct impacts on privacy leakage. Considering general correlations, a closed-form expression of privacy leakage is derived for continuous data, and a chain rule is presented for discrete data. Our results are valid for general linear queries, including count, sum, mean, and histogram. Numerical experiments are presented to verify our theoretical analysis.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.02606/full.md

## Figures

17 figures with captions in the complete paper: https://tomesphere.com/paper/1906.02606/full.md

## References

47 references — full list in the complete paper: https://tomesphere.com/paper/1906.02606/full.md

---
Source: https://tomesphere.com/paper/1906.02606