Local differential privacy: Elbow effect in optimal density estimation   and adaptation over Besov ellipsoids

Cristina Butucea; Amandine Dubois; Martin Kroll; Adrien Saumard

arXiv:1903.01927·math.ST·March 6, 2019

Local differential privacy: Elbow effect in optimal density estimation and adaptation over Besov ellipsoids

Cristina Butucea, Amandine Dubois, Martin Kroll, Adrien Saumard

PDF

TL;DR

This paper investigates the limits of non-parametric density estimation under local differential privacy constraints, revealing an elbow effect in convergence rates and proposing wavelet-based estimators that achieve near-optimal performance.

Contribution

It introduces a lower bound on estimation rates under local differential privacy and develops wavelet estimators that adaptively attain these bounds across Besov spaces.

Findings

01

Lower bounds show deterioration of convergence rates due to privacy constraints.

02

A wavelet estimator attains the lower bound when p ≥ r.

03

An adaptive wavelet estimator achieves near-optimal rates in all cases.

Abstract

We address the problem of non-parametric density estimation under the additional constraint that only privatised data are allowed to be published and available for inference. For this purpose, we adopt a recent generalisation of classical minimax theory to the framework of local $α$ -differential privacy and provide a lower bound on the rate of convergence over Besov spaces $B_{pq}^{s}$ under mean integrated $L^{r}$ -risk. This lower bound is deteriorated compared to the standard setup without privacy, and reveals a twofold elbow effect. In order to fulfil the privacy requirement, we suggest adding suitably scaled Laplace noise to empirical wavelet coefficients. Upper bounds within (at most) a logarithmic factor are derived under the assumption that $α$ stays bounded as $n$ increases: A linear but non-adaptive wavelet estimator is shown to attain the lower bound whenever $p…

Equations380

Z_{i} \sim Q_{i} (\cdot ∣ X_{i} = x_{i}, Z_{1} = z_{1}, \dots, Z_{i - 1} = z_{i - 1})

Z_{i} \sim Q_{i} (\cdot ∣ X_{i} = x_{i}, Z_{1} = z_{1}, \dots, Z_{i - 1} = z_{i - 1})

Z_{i} \sim Q (\cdot ∣ X_{i} = x_{i})

Z_{i} \sim Q (\cdot ∣ X_{i} = x_{i})

A \in Z sup \frac{Q _{i} ( A ∣ X _{i} = x , Z _{1} = z _{1} , \dots , Z _{i - 1} = z _{i - 1} )}{Q _{i} ( A ∣ X _{i} = x ^{'} , Z _{1} = z _{1} , \dots , Z _{i - 1} = z _{i - 1} )} \leq exp (α) for all x, x^{'} \in X .

A \in Z sup \frac{Q _{i} ( A ∣ X _{i} = x , Z _{1} = z _{1} , \dots , Z _{i - 1} = z _{i - 1} )}{Q _{i} ( A ∣ X _{i} = x ^{'} , Z _{1} = z _{1} , \dots , Z _{i - 1} = z _{i - 1} )} \leq exp (α) for all x, x^{'} \in X .

A \in Z sup \frac{Q ( A ∣ X _{i} = x )}{Q ( A ∣ X _{i} = x ^{'} )} \leq exp (α) for all x, x^{'} \in X .

A \in Z sup \frac{Q ( A ∣ X _{i} = x )}{Q ( A ∣ X _{i} = x ^{'} )} \leq exp (α) for all x, x^{'} \in X .

z \in Z sup \frac{q ( z ∣ X _{i} = x )}{q ( z ∣ X _{i} = x ^{'} )} \leq exp (α) for all x, x^{'} \in X .

z \in Z sup \frac{q ( z ∣ X _{i} = x )}{q ( z ∣ X _{i} = x ^{'} )} \leq exp (α) for all x, x^{'} \in X .

R_{n} (ℓ, F) = f in f f \in F sup E [ℓ (f, f)]

R_{n} (ℓ, F) = f in f f \in F sup E [ℓ (f, f)]

f \in F sup E [ℓ (f, f)] \leq C (ℓ, F) R_{n} (ℓ, F) .

f \in F sup E [ℓ (f, f)] \leq C (ℓ, F) R_{n} (ℓ, F) .

{φ_{j_{0} k} = 2^{j_{0} /2} φ (2^{j_{0}} (\cdot) - k) : k \in Z} \cup {ψ_{j k} = 2^{j /2} ψ (2^{j} (\cdot) - k) : j \geq j_{0}, k \in Z} .

{φ_{j_{0} k} = 2^{j_{0} /2} φ (2^{j_{0}} (\cdot) - k) : k \in Z} \cup {ψ_{j k} = 2^{j /2} ψ (2^{j} (\cdot) - k) : j \geq j_{0}, k \in Z} .

f = k \in Z \sum α_{j_{0} k} φ_{j_{0} k} + j \geq j_{0} \sum k \in Z \sum β_{j k} ψ_{j k}

f = k \in Z \sum α_{j_{0} k} φ_{j_{0} k} + j \geq j_{0} \sum k \in Z \sum β_{j k} ψ_{j k}

α_{j_{0} k} = \int_{R} f (x) φ_{j_{0} k} (x) d x and β_{j k} = \int_{R} f (x) ψ_{j k} (x) d x .

α_{j_{0} k} = \int_{R} f (x) φ_{j_{0} k} (x) d x and β_{j k} = \int_{R} f (x) ψ_{j k} (x) d x .

J_{spq}\vcentcolon=\lVert\alpha_{0\cdot}\rVert_{p}+\bigg{(}\sum_{j\geq 0}(2^{j(s+1/2-1/p)}\lVert\beta_{j\cdot}\rVert_{p})^{q}\bigg{)}^{1/q}

J_{spq}\vcentcolon=\lVert\alpha_{0\cdot}\rVert_{p}+\bigg{(}\sum_{j\geq 0}(2^{j(s+1/2-1/p)}\lVert\beta_{j\cdot}\rVert_{p})^{q}\bigg{)}^{1/q}

B_{pq}^{s} (L) = {f : R \to R : J_{s pq} (f) \leq L} .

B_{pq}^{s} (L) = {f : R \to R : J_{s pq} (f) \leq L} .

D_{pq}^{s} = D_{pq}^{s} (L, T) = {f : f \in B_{pq}^{s} (L), f \geq 0, \int_{R} f (x) d x = 1 and supp (f) \subseteq [- T, T]},

D_{pq}^{s} = D_{pq}^{s} (L, T) = {f : f \in B_{pq}^{s} (L), f \geq 0, \int_{R} f (x) d x = 1 and supp (f) \subseteq [- T, T]},

R_{n} (∥ \cdot ∥_{r}^{r}, D_{pq}^{s}) ≳ r_{n}, where r_{n} = {n^{- \frac{r s}{2 s + 1}}, (n / lo g n)^{- \frac{r ( s - 1/ p + 1/ r )}{2 ( s - 1/ p ) + 1}}, if p > \frac{r}{2 s + 1}, if p \leq \frac{r}{2 s + 1},

R_{n} (∥ \cdot ∥_{r}^{r}, D_{pq}^{s}) ≳ r_{n}, where r_{n} = {n^{- \frac{r s}{2 s + 1}}, (n / lo g n)^{- \frac{r ( s - 1/ p + 1/ r )}{2 ( s - 1/ p ) + 1}}, if p > \frac{r}{2 s + 1}, if p \leq \frac{r}{2 s + 1},

k \in Z \sum α_{j_{0} k}^{'} φ_{j_{0} k} (x) + j = j_{0} \sum j_{1} k \in Z \sum β_{j k}^{'} ψ_{j k} (x)

k \in Z \sum α_{j_{0} k}^{'} φ_{j_{0} k} (x) + j = j_{0} \sum j_{1} k \in Z \sum β_{j k}^{'} ψ_{j k} (x)

R_{n, α}^{*} (ℓ, F) = f Q \in Q_{α} in f f \in F sup E_{f, Q} [ℓ (f, f)] .

R_{n, α}^{*} (ℓ, F) = f Q \in Q_{α} in f f \in F sup E_{f, Q} [ℓ (f, f)] .

f \in F sup E_{f, Q} [ℓ (f, f)] \leq C (ℓ, F) R_{n, α}^{*} (ℓ, F) .

f \in F sup E_{f, Q} [ℓ (f, f)] \leq C (ℓ, F) R_{n, α}^{*} (ℓ, F) .

R_{n, α}^{*} (∥ \cdot ∥_{r}^{r}, D_{pq}^{s}) ≳ r_{n, α}^{*}, where r_{n, α}^{*} = ⎩ ⎨ ⎧ (n (e^{α} - 1)^{2})^{- \frac{r s}{2 s + 2}}, (\frac{n ( e ^{α} - 1 ) ^{2}}{l o g ( n ( e ^{α} - 1 ) ^{2} )})^{- \frac{r ( s - 1/ p + 1/ r )}{2 ( s - 1/ p ) + 2}}, if p > \frac{r}{s + 1}, if p \leq \frac{r}{s + 1} .

R_{n, α}^{*} (∥ \cdot ∥_{r}^{r}, D_{pq}^{s}) ≳ r_{n, α}^{*}, where r_{n, α}^{*} = ⎩ ⎨ ⎧ (n (e^{α} - 1)^{2})^{- \frac{r s}{2 s + 2}}, (\frac{n ( e ^{α} - 1 ) ^{2}}{l o g ( n ( e ^{α} - 1 ) ^{2} )})^{- \frac{r ( s - 1/ p + 1/ r )}{2 ( s - 1/ p ) + 2}}, if p > \frac{r}{s + 1}, if p \leq \frac{r}{s + 1} .

f Q \in Q_{α} in f f \in D_{pq}^{s} (L, T) sup E_{f, Q} ∥ f - f ∥_{r}^{r} ≳ (n (e^{α} - 1)^{2})^{- \frac{r s}{2 s + 2}},

f Q \in Q_{α} in f f \in D_{pq}^{s} (L, T) sup E_{f, Q} ∥ f - f ∥_{r}^{r} ≳ (n (e^{α} - 1)^{2})^{- \frac{r s}{2 s + 2}},

\inf_{\begin{subarray}{c}\widetilde{f}\\ Q\in\mathcal{Q}_{\alpha}\end{subarray}}\sup_{f\in\mathcal{D}^{s}_{pq}(L,T)}\mathbb{E}_{f,Q}\|\widetilde{f}-f\|_{r}^{r}\gtrsim\bigg{(}\frac{\log(n(e^{\alpha}-1)^{2})}{n(e^{\alpha}-1)^{2}}\bigg{)}^{r\cdot\frac{s-1/p+1/r}{2(s-1/p)+2}},

\inf_{\begin{subarray}{c}\widetilde{f}\\ Q\in\mathcal{Q}_{\alpha}\end{subarray}}\sup_{f\in\mathcal{D}^{s}_{pq}(L,T)}\mathbb{E}_{f,Q}\|\widetilde{f}-f\|_{r}^{r}\gtrsim\bigg{(}\frac{\log(n(e^{\alpha}-1)^{2})}{n(e^{\alpha}-1)^{2}}\bigg{)}^{r\cdot\frac{s-1/p+1/r}{2(s-1/p)+2}},

R_{n, α}^{*} \geq R_{n} \lor r_{n, α}^{*} \geq r_{n} \lor r_{n, α}^{*},

R_{n, α}^{*} \geq R_{n} \lor r_{n, α}^{*} \geq r_{n} \lor r_{n, α}^{*},

R_{n, α}^{*} (∥ \cdot ∥_{r}^{r}, D_{pq}^{s}) ≳ ⎩ ⎨ ⎧ n^{- \frac{r s}{2 s + 1}} \lor (n (e^{α} - 1)^{2})^{- \frac{r s}{2 s + 2}}, n^{- \frac{r s}{2 s + 1}} \lor (\frac{n ( e ^{α} - 1 ) ^{2}}{l o g ( n ( e ^{α} - 1 ) ^{2} )})^{- \frac{r ( s - 1/ p + 1/ r )}{2 ( s - 1/ p ) + 2}}, (\frac{n}{l o g n})^{- \frac{r ( s - 1/ p + 1/ r )}{2 ( s - 1/ p ) + 1}} \lor (\frac{n ( e ^{α} - 1 ) ^{2}}{l o g ( n ( e ^{α} - 1 ) ^{2} )})^{- \frac{r ( s - 1/ p + 1/ r )}{2 ( s - 1/ p ) + 2}}, if p > \frac{r}{s + 1}, if \frac{r}{2 s + 1} < p \leq \frac{r}{s + 1}, if p \leq \frac{r}{2 s + 1} .

R_{n, α}^{*} (∥ \cdot ∥_{r}^{r}, D_{pq}^{s}) ≳ ⎩ ⎨ ⎧ n^{- \frac{r s}{2 s + 1}} \lor (n (e^{α} - 1)^{2})^{- \frac{r s}{2 s + 2}}, n^{- \frac{r s}{2 s + 1}} \lor (\frac{n ( e ^{α} - 1 ) ^{2}}{l o g ( n ( e ^{α} - 1 ) ^{2} )})^{- \frac{r ( s - 1/ p + 1/ r )}{2 ( s - 1/ p ) + 2}}, (\frac{n}{l o g n})^{- \frac{r ( s - 1/ p + 1/ r )}{2 ( s - 1/ p ) + 1}} \lor (\frac{n ( e ^{α} - 1 ) ^{2}}{l o g ( n ( e ^{α} - 1 ) ^{2} )})^{- \frac{r ( s - 1/ p + 1/ r )}{2 ( s - 1/ p ) + 2}}, if p > \frac{r}{s + 1}, if \frac{r}{2 s + 1} < p \leq \frac{r}{s + 1}, if p \leq \frac{r}{2 s + 1} .

φ and ψ are compactly supported on an interval [- A, A] .

φ and ψ are compactly supported on an interval [- A, A] .

Z_{ij k} = {φ_{j_{0} k} (X_{i}) + σ_{j_{0} - 1} W_{i, j_{0} - 1, k}, ψ_{j k} (X_{i}) + σ_{j} W_{ij k}, if j = j_{0} - 1, k \in N_{j_{0} - 1}, if j \in [[j_{0}, j_{1}]], k \in N_{j},

Z_{ij k} = {φ_{j_{0} k} (X_{i}) + σ_{j_{0} - 1} W_{i, j_{0} - 1, k}, ψ_{j k} (X_{i}) + σ_{j} W_{ij k}, if j = j_{0} - 1, k \in N_{j_{0} - 1}, if j \in [[j_{0}, j_{1}]], k \in N_{j},

σ_{j_{0} - 1} = \frac{4 c _{A} ∥ φ ∥ _{\infty}}{α} \cdot 2^{j_{0} /2} and σ_{j} = \frac{4 c _{A} ∥ ψ ∥ _{\infty}}{α} \cdot \frac{2}{2 - 1} \cdot 2^{j_{1} /2},

σ_{j_{0} - 1} = \frac{4 c _{A} ∥ φ ∥ _{\infty}}{α} \cdot 2^{j_{0} /2} and σ_{j} = \frac{4 c _{A} ∥ ψ ∥ _{\infty}}{α} \cdot \frac{2}{2 - 1} \cdot 2^{j_{1} /2},

Z_{ij k} = {φ_{j_{0} k} (X_{i}) + σ_{j_{0} - 1} W_{i, j_{0} - 1, k}, ψ_{j k} (X_{i}) + σ_{j} W_{ij k}, if j = j_{0} - 1, k \in N_{j_{0} - 1}, if j \in [[j_{0}, j_{1}]], k \in N_{j},

Z_{ij k} = {φ_{j_{0} k} (X_{i}) + σ_{j_{0} - 1} W_{i, j_{0} - 1, k}, ψ_{j k} (X_{i}) + σ_{j} W_{ij k}, if j = j_{0} - 1, k \in N_{j_{0} - 1}, if j \in [[j_{0}, j_{1}]], k \in N_{j},

σ_{j_{0} - 1} = \frac{4 c _{A} ∥ φ ∥ _{\infty}}{α} \cdot 2^{j_{0} /2} and σ_{j} = \frac{4 c _{A} ∥ ψ ∥ _{\infty}}{α} \cdot \frac{2 ν - 1}{ν - 1} \cdot (j \lor 1)^{ν} \cdot 2^{j /2},

σ_{j_{0} - 1} = \frac{4 c _{A} ∥ φ ∥ _{\infty}}{α} \cdot 2^{j_{0} /2} and σ_{j} = \frac{4 c _{A} ∥ ψ ∥ _{\infty}}{α} \cdot \frac{2 ν - 1}{ν - 1} \cdot (j \lor 1)^{ν} \cdot 2^{j /2},

q^{Z_{i} ∣ X_{i} = x_{i}} (z_{i})

q^{Z_{i} ∣ X_{i} = x_{i}} (z_{i})

\frac{q ^{Z_{i} ∣ X_{i} = x_{i}} ( z _{i} )}{q ^{Z_{i} ∣ X_{i} = x_{i}^{'}} ( z _{i} )}

\frac{q ^{Z_{i} ∣ X_{i} = x_{i}} ( z _{i} )}{q ^{Z_{i} ∣ X_{i} = x_{i}^{'}} ( z _{i} )}

\displaystyle\hskip 20.00003pt\cdot\prod_{j=j_{0}}^{j_{1}}\prod_{k\in\mathcal{N}_{j}}\exp\bigg{(}\frac{\lvert z_{ijk}-\psi_{jk}(x_{i}^{\prime})|-|z_{ijk}-\psi_{jk}(x_{i})\rvert}{\widetilde{\sigma}_{j}}\bigg{)}

\displaystyle\leq\exp\bigg{(}\sum_{k\in\mathcal{N}_{j_{0}-1}}\frac{\lvert\varphi_{j_{0}k}(x_{i})|+|\varphi_{j_{0}k}(x_{i}^{\prime})\rvert}{\sigma_{j_{0}-1}}\bigg{)}\cdot\exp\bigg{(}\sum_{j=j_{0}}^{j_{1}}\sum_{k\in\mathcal{N}_{j}}\frac{\lvert\psi_{jk}(x_{i})|+|\psi_{jk}(x_{i}^{\prime})\rvert}{\widetilde{\sigma}_{j}}\bigg{)}.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Local differential privacy: Elbow effect in optimal density estimation and adaptation over Besov ellipsoids

Cristina Butucea

Cristina Butucea, CREST, ENSAE, Institut Polytechnique de Paris, 5 avenue Henry Le Chatelier, F-91120 Palaiseau

[email protected]

,

Amandine Dubois

Amandine Dubois, CREST-ENSAI, Campus de Ker-Lann - Rue Blaise Pascal - BP 37203 - 35172 BRUZ cedex

[email protected]

,

Martin Kroll

Martin Kroll, CREST, ENSAE, Institut Polytechnique de Paris, 5 avenue Henry Le Chatelier, F-91120 Palaiseau

[email protected]

and

Adrien Saumard

Adrien Saumard, CREST-ENSAI, Campus de Ker-Lann - Rue Blaise Pascal - BP 37203 - 35172 BRUZ cedex

[email protected]

Abstract.

We address the problem of non-parametric density estimation under the additional constraint that only privatised data are allowed to be published and available for inference. For this purpose, we adopt a recent generalisation of classical minimax theory to the framework of local $\alpha$ -differential privacy and provide a lower bound on the rate of convergence over Besov spaces $\mathcal{B}^{s}_{pq}$ under mean integrated $\mathbb{L}^{r}$ -risk. This lower bound is deteriorated compared to the standard setup without privacy, and reveals a twofold elbow effect. In order to fulfil the privacy requirement, we suggest adding suitably scaled Laplace noise to empirical wavelet coefficients. Upper bounds within (at most) a logarithmic factor are derived under the assumption that $\alpha$ stays bounded as $n$ increases: A linear but non-adaptive wavelet estimator is shown to attain the lower bound whenever $p\geq r$ but provides a slower rate of convergence otherwise. An adaptive non-linear wavelet estimator with appropriately chosen smoothing parameters and thresholding is shown to attain the lower bound within a logarithmic factor for all cases.

Key words and phrases:

Density estimation, Besov classes of functions, Local differential privacy, Lower bounds, Minimax rates, Adaptive estimation, Wavelet thresholding

2010 Mathematics Subject Classification:

62G07 (primary), and 62G20 (secondary)

1. Introduction

Problem statement

In the modern information age, increasingly more institutions are collecting and storing data. Provided that a certain amount of privacy is guaranteed, some of these institutions might be willing to provide access to selected data sets. Examples of such data may include information about participants in a medical study, clients of a web service, or persons interviewed in a scientific survey. In this framework, the following questions arise naturally: How can data be sufficiently anonymised, given a rigorous definition of privacy, and what are the consequences for subsequent data analyses resulting from the chosen anonymisation procedure? The answer to these questions depends on several interacting parameters, namely the privacy definition at hand, the potential extent of collaboration of the involved data holding entities, and the kind of data mining tasks that should be feasible based on the private data.

In this paper, we consider the problem of non-parametric density estimation under local differential privacy as a special instance of the general problem sketched in the previous paragraph: For $i=1,\ldots,n$ , the $i$ -th data holder observes a real-valued random variable $X_{i}$ distributed according to a probability density function $f$ . The aim is that every data holder releases an anonymised view $Z_{i}$ of $X_{i}$ such that the privacy notion of local differential privacy, that is introduced next, is satisfied and that the density $f$ can be estimated from the data $Z_{1},\ldots,Z_{n}$ in an optimal way.

Local differential private estimation

The notion of local differential privacy aggregates two different concepts, namely local privacy and differential privacy, that we explain in the sequel.

The qualitative notion of local privacy characterises how the different entities holding the data $X_{1},\ldots,X_{n}$ might interact to generate a private release $Z$ . It is opposed to the concept of global privacy where the respective data holders share confidence in a common curator who has access to the ensemble of non-masked data $X_{1},\ldots,X_{n}$ and generates the releasable data from this complete information. In the local setup, such an authority that is trusted by all the parties, does not exist. However, some amount of interaction between the different parties is still allowed. The releasable data $Z_{1},\ldots,Z_{n}$ are obtained by successively applying suitable Markov kernels. Given $X_{i}=x_{i}$ and $Z_{1}=z_{1},\ldots,Z_{i-1}=z_{i-1}$ , the $i$ -th dataholder draws

[TABLE]

for some Markov kernel $Q_{i}:\mathscr{Z}\times\mathcal{X}\times\mathcal{Z}^{i-1}\to[0,1]$ where the measure spaces of the non-private and private data are denoted with $(\mathcal{X},\mathscr{X})$ and $(\mathcal{Z},\mathscr{Z})$ , respectively. An important special case is that of non-interactive local privacy where the random value of $Z_{i}$ depends on $X_{i}$ only and must not depend on preceding values of $Z$ . More precisely, in the non-interactive case we have

[TABLE]

for some Markov kernel $Q$ that does no longer depend on the index $i$ . The non-interactive scenario seems to be more attractive in practice since no communication between the data holders is assumed and it is balanced in the sense that no participant obtains any information about any other participant’s data. From a mathematical point of view, however, allowing also non-interactive procedures does not lead to more technical proofs. Thus, we potentially allow non-interactive methods in our minimax analysis, although the anonymisation techniques proposed in this paper are exclusively non-interactive. Let us mention that for some tasks, however, interactive mechanisms provide natural and attractive alternatives (for instance, for private estimation in generalized linear models; see [DJW18], Section 5.2.1).

The notion of differential privacy is a quantitative one and introduces a condition that makes the problem at hand mathematically tractable. We provide its definition for the locally private case only and refer the reader to [WZ10] for a definition in the global case.

Definition 1.1.

A sequence of Markov kernels $Q_{i}:\mathscr{Z}\times\mathcal{X}\times\mathcal{Z}^{i-1}\to[0,1]$ provides $\alpha$ -differential privacy if

[TABLE]

In the non-interactive case, this condition is replaced with

[TABLE]

We denote with $\mathcal{Q}_{\alpha}$ the set of all local $\alpha$ -differential private Markov kernels.

Thus, the parameter $\alpha$ quantifies the amount of privacy that is guaranteed: Setting $\alpha=0$ ensures perfect privacy whereas letting $\alpha$ tend to infinity softens the privacy restriction. In the non-interactive case, let us suppose that the Markov kernel $Q:\mathscr{Z}\times\mathcal{X}\to[0,1]$ has a density $q$ with respect to some dominating measure. Then, the defining property of $\alpha$ -differential privacy is equivalent to

[TABLE]

A consequence from the definition of $\alpha$ -differential privacy is plausible deniability of the data in the following sense: Given the private view $Z_{i}$ only, the power of any test of the null hypothesis $H_{0}:X_{i}=x$ against the alternative $H_{1}:X_{i}=x^{\prime}$ with prescribed first error probability $\gamma$ has power bounded from above by $\gamma\exp(\alpha)$ (see [WZ10], Theorem 2.4).

Rate optimal density estimation over Besov ellipsoids

Let us briefly review some well-known results on non-parametric density estimation in the non-private setup where $X_{1},\ldots,X_{n}$ can be observed. This classical model provides a natural benchmark for the model where additional privacy restrictions are imposed, and having in mind the results for this benchmark model turns out to be useful for understanding the ones for the model with privacy.

Density estimation from a sample $X_{1},\ldots,X_{n}$ of observations is one of the paradigmatic problems in non-parametric statistics. A popular framework is that of minimax optimal estimation: Given a loss function $\ell$ (that is, a function mapping a pair of density functions $(f,g)$ to some non-negative real number) and any class $\mathcal{F}$ of candidate density functions, the quantity of interest is the minimax risk

[TABLE]

where the infimum is taken over all estimators (that is, $\sigma(X_{1},\ldots,X_{n})$ -measurable functions). In this setup, an estimator $\widehat{f}$ is called rate optimal if

[TABLE]

Several function classes, loss functions and types of estimators have been intensively studied for the density estimation problem (see [Tsy09] and [GN16] for comprehensive overviews of the topic). Throughout this paper, we consider the integrated risk associated to $\mathbb{L}^{r}$ -loss defined by $\ell(f,g)=\lVert f-g\rVert_{r}^{r}$ for $r\geq 1$ . For the Besov spaces to be considered in the sequel, wavelet methods have turned out particularly convenient. Given a father wavelet $\varphi$ and a mother wavelet $\psi$ associated to it, verifying some sufficient conditions (see conditions (5.10)–(5.12) in [Här+98]), and an integer $j_{0}\in\mathbb{Z}$ , a wavelet basis of $\mathbb{L}^{2}(\mathbb{R})$ is given by

[TABLE]

Given such a basis, the probability density $f$ admits the following formal expansion (in $\mathbb{L}^{2}$ sense):

[TABLE]

where the wavelet coefficients are defined as

[TABLE]

An attractive property of wavelet expansions as (1.4) is that the membership of Besov spaces can be characterised in terms of its wavelet coefficients with respect to a well chosen wavelet basis. In the sequel, we will work under the following assumption on the father wavelet $\varphi$ .

Assumption 1.1.

Following [Här+98], we assume that the father wavelet function $\varphi$ generates a multiresolution analysis of $\mathbb{L}^{2}(\mathbb{R})$ , that it is $N+1$ times weakly differentiable for some integer $N$ such that $0<s<N+1$ , and that its derivative satisfies $\sup_{x}\sum_{k}\lvert\varphi^{(N+1)}(x-k)\rvert<\infty$ a.e. Moreover, we assume that there exists a bounded, non-increasing function $\Phi$ on $\mathbb{R}_{+}$ such that $\lvert\varphi(u)\rvert\leq\Phi(\lvert u\rvert)$ and that both $\int\Phi(\lvert u\rvert)\mathrm{d}u<\infty$ and $\int\Phi(\lvert u\rvert)\lvert u\rvert^{N}\mathrm{d}u<\infty$ .

If the father wavelet function $\varphi$ verifies Assumption 1.1 then, given parameters $s>0$ and $1\leq p,q\leq\infty$ , the fact that $f$ belongs to the Besov space $\mathcal{B}^{s}_{pq}$ is equivalent to $J_{spq}(f)<\infty$ where

[TABLE]

for $1\leq q<\infty$ and the usual modification if $q=\infty$ . Fixing such a wavelet basis, we consider Besov ellipsoids defined as

[TABLE]

Since our interest is in density estimation, a quite natural class to consider is

[TABLE]

where $\mathrm{supp}(f)$ denotes the support of the function $f$ . Note that we consider here the Besov smoothness of $f$ as a function defined on the whole real line, or, equivalently, that $f$ belongs to a periodic Besov class. It would equally be possible to define Besov smoothness over the support $[-T,T]$ . Then the wavelet basis has to be boundary corrected so that it detects the smoothness on this interval only and not the potential lack of smoothness of $f$ at its boundary. We refer the reader to [GN16] for boundary corrected wavelets, that also dispose of all the properties that we need in the sequel.

It is well-known [GN16, Här+98, Don+96] that

[TABLE]

and these rates are optimal or suboptimal by a logarithmic factor only (see [Här+98] for an extensive discussion). The structural change of the rate between dense zone (where $p>r/(2s+1)$ ) and sparse zone (where $p\leq r/(2s+1)$ ) is sometimes called an elbow effect.

Moreover, in the dense case, we can distinguish the homogeneous zone when $p\geq r$ and the non-homogeneous zone where $r/(2s+1)<p<r$ . In the homogeneous case, linear wavelet estimators of the form

[TABLE]

with $\alpha^{\prime}_{j_{0}k}=\frac{1}{n}\sum_{i=1}^{n}\varphi_{j_{0}k}(X_{i})$ , $\beta^{\prime}_{jk}=\frac{1}{n}\sum_{i=1}^{n}\psi_{jk}(X_{i})$ , and appropriately chosen $j_{0},j_{1}$ are rate optimal whereas linear procedures are necessarily sub-optimal in the non-homogeneous case (see [Här+98] and references therein). In this latter scenario as well as in the sparse case, non-linear estimators based on wavelet thresholding turn out to be optimal at least up to logarithmic factors.

Minimax framework under privacy constraints

Let us now describe how to extend the classical minimax setup in order to encompass the framework of local differential privacy. Since not only the estimation procedure but also the Markov kernels guaranteeing local $\alpha$ -differential privacy can freely be chosen, it is natural to replace (1.2) with the local $\alpha$ -differential minimax risk defined as

[TABLE]

Here the infimum is taken both over all $(\mathcal{Z},\mathscr{Z})$ -measurable estimators of $f$ and all Markov kernels guaranteeing local $\alpha$ -differential privacy. A tuple $(\widehat{Q},\widehat{f})$ consisting of a privacy mechanism and an estimator $\widehat{f}$ is rate optimal (with respect to the local $\alpha$ -differential private risk) if

[TABLE]

The quantity $\mathcal{R}^{\ast}_{n,\alpha}(\ell,\mathcal{F})$ as well as the construction of optimal privacy mechanism and estimators represent the principal interest of the rest of the paper.

Related work

Research on statistical estimation under privacy constraints is rather recent. A landmark paper is [WZ10] where research on the subject has been initiated and density estimation via histograms and orthogonal series in the global privacy setup have been discussed. In the same global framework, the article [HRW13] considers anonymization of functional data and discusses kernel density estimators as the main example. Local $\alpha$ -differential privacy was intensively studied in [DJW13] and the companion article [DJW18]. In [DJW13] the authors show that the well-known technique of randomized response from survey statistics can be interpreted under the umbrella of local $\alpha$ -differential privacy. In the context of density estimation, [DJW13] established minimax rates of convergence for the mean integrated squared error over Sobolev classes with arbitrary smoothness parameter $\beta\geq 1$ . They establish the minimax rate of order $n^{-\beta/(\beta+1)}$ for the mean integrated squared error over Sobolev classes with $\beta=1$ and show that this optimal rate can be attained by Laplace perturbation of empirical histogram coefficients. The papers [DJW13, DJW18] provide also results for Sobolev classes with higher degrees of smoothness ( $\beta>1$ ) but in this case a mere perturbation of the empirical Fourier coefficients does not lead to a rate optimal method (see [DJW13], Observation 1 for the non-optimality of this approach). By means of a more sophisticated sampling technique (see [DJW13], p. 11 or [DJW18], Section 5.2.2), however, the authors derive the minimax rate of convergence that is $(n\alpha^{2})^{-\beta/(\beta+1)}$ also in the general case. Furthermore, [DJW13] provides private versions of classical information-theoretical bounds that allow to apply standard lower bound techniques also in the private setup. In [RS18], the estimation of linear functionals in the framework of local privacy is considered and a characterisation of the rates of convergence in terms of moduli of continuity is obtained which is in parallel to well-known results for the non-private setup [DL91]. This general analysis contains the private estimation of a probability density at a fixed point under mean squared error as a special case.

Main results

In Section 2, in addition and in formal analogy to (1.5), we derive, under similar technical assumptions, the following lower bound on the private minimax risk:

[TABLE]

This lower bound is complemented by corresponding upper bound results: The anonymisation technique used to create the private views of the non-releasable data $X_{1},\ldots,X_{n}$ consists in an appropriately scaled version of the classical Laplace mechanism applied on the empirical wavelet coefficients (Section 3). The wavelet estimators considered in Sections 4 and 5 are based on the availability of the privatised data $Z_{1},\ldots,Z_{n}$ only. As in the non-private case, a linear wavelet estimator attains the given rate in the homogeneous case, that is, whenever $p\geq r$ (Section 4). In Section 5, we study non-linear estimators and show that an estimator using hard thresholding can nearly attain the lower bounds both in the dense and in the sparse zone.

Notational conventions

For real numbers $a,b$ we write $\llbracket a,b\rrbracket=[a,b]\cap\mathbb{Z}$ . We denote with $C$ a generic constant that might change with every appearance. For two sequences $\{a_{n}\}_{n},\,\{b_{n}\}_{n}$ , we denote by $a_{n}\lesssim b_{n}$ that there exist some constant $C>0$ and a fixed integer number $N$ such that $a_{n}\leq Cb_{n}$ , for all $n\geq N$ . We say that $a_{n}\asymp b_{n}$ , if both $a_{n}\lesssim b_{n}$ and $b_{n}\lesssim a_{n}$ . If $b_{n}>0$ , we denote by $a_{n}\simeq b_{n}$ the fact that $a_{n}/b_{n}\to 1$ as $n\to\infty$ . We recall that a centred Laplace distribution with parameter $\lambda>0$ has the probability density function $p_{\lambda}(x)=\frac{1}{2\lambda}\exp(-\frac{\lvert x\rvert}{\lambda})$ , for all real number $x$ . In particular, if $X\sim p_{\lambda}$ , then $\mathbb{E}\lvert X\rvert^{k}=k!\lambda^{k}$ for all $k\in\mathbb{N}$ .

2. Lower bounds

The purpose of this section is to derive (1.6) and hence providing an analogue of (1.5) under local $\alpha$ -differential privacy. To this purpose, we proceed in two steps. The first lower bound, given in Proposition 2.1, is stronger in the private dense zone ( $p>r/(s+1)$ ), whereas the second one, given in Proposition 2.2, dominates in the private sparse zone where $p\leq r/(s+1)$ . An essential tool for both proofs is a strong information theoretical inequality (our Proposition A.1) proved in [DJW18], which states a bound for the Kullback-Leibler divergence between any distributions that have been processed through an arbitrary channel guaranteeing local $\alpha$ -differential privacy. We begin with the lower bound that is dominating in the dense zone.

Proposition 2.1.

Let $\alpha\in(0,\infty)$ and let $L,T>0$ . Then,

[TABLE]

where the infimum is taken over all estimators $\widetilde{f}$ based on the private views $Z_{1},\ldots,Z_{n}$ and all Markov kernels $Q\in\mathcal{Q}_{\alpha}$ guaranteeing local $\alpha$ -differential privacy.

The proof of Proposition 2.1 is based on a reduction of the class $\mathcal{D}^{s}_{pq}$ to a finite number of hypotheses indexed by the vertices of a hypercube of suitable dimension. It is given in Section A.1 in the appendix.

The following proposition complements Proposition 2.1 in stating a lower bound that is stronger in the private sparse zone.

Proposition 2.2.

Let $\alpha\in(0,\infty)$ . Let $p\geq 1$ , $s\geq 1/p$ and let $L,T>0$ . Then,

[TABLE]

where the infimum is taken over all estimators $\widetilde{f}$ based on the private views $Z_{1},\ldots,Z_{n}$ and all channels $Q\in\mathcal{Q}_{\alpha}$ providing local $\alpha$ -differential privacy.

The proof of Proposition 2.2 is given in Section A.2 in the appendix.

Taking the maximum of the lower bounds obtained in Propositions 2.1 and 2.2 yields (1.6). In addition to our novel lower bounds, the known bounds (1.5) from the non-private framework still hold true under local $\alpha$ -differential privacy since processing the original data $X_{1},\ldots,X_{n}$ through a privacy mechanism can be interpreted equivalently as imposing a restriction on the set of admissible estimators in (1.2). More precisely, the constraint of local $\alpha$ -differential privacy confines the set of potential estimators to those of the form $\widetilde{f}=f\circ Q$ where $Q\in\mathcal{Q}_{\alpha}$ and $f$ is any measurable function. Thus,

[TABLE]

where the quantity $\mathfrak{r}_{n}$ is defined in (1.5). Hence, the following corollary holds.

Corollary 2.1.

Let the assumptions of Propositions 2.1 and 2.2 hold true. Then,

[TABLE]

Note that the frontier between the dense and the sparse zone in the private framework is different from the one in the non-private framework leading to a partition into three regimes for the lower bound and a twofold elbow effect. Note that these lower bounds match the upper bounds derived in Section 4 and 5 at most up to logarithmic factors whenever $\alpha$ stays bounded as $n$ increases. In addition, the bounds from the non-private setup dominate provided that $\alpha$ increases sufficiently fast in terms of $n$ .

3. Privacy mechanisms

Let us denote with $X_{1},\ldots,X_{n}$ the real-valued random variables that represent the non-private observations held by the different data holders. We assume that $X_{1},\ldots,X_{n}\sim f$ for $f\in\mathcal{D}^{s}_{pq}$ . In particular, the support of the density $f$ is contained in the interval $[-T,T]$ . In this section, we introduce a non-interactive privacy mechanism creating a private release $Z_{1},\ldots,Z_{n}$ based on the non-private sample that satisfies the defining property of $\alpha$ -differential privacy. For this purpose, we consider a wavelet basis as in (1.3). We assume in the sequel that the following condition on the parent wavelets is satisfied:

[TABLE]

The idea of the proposed anonymisation technique is to mask the empirical wavelet coefficients $\alpha^{\prime}_{j_{0}k}$ and $\beta^{\prime}_{jk}$ for certain values of $j$ . A consequence of (W1) and the compact support of $f$ is that for any $j_{0}\in\mathbb{Z}$ and any fixed resolution level $j\in\mathbb{Z}$ , the corresponding $\alpha_{j_{0}k}$ and $\beta_{jk}$ can a priori be non-zero for a finite number of $k$ only. We denote the set of $k$ with potentially non-zero $\alpha_{j_{0}k}$ by $\mathcal{N}_{j_{0}-1}$ . Analogously, for $j\geq j_{0}$ , the set of $k$ with potentially non-zero $\beta_{jk}$ is denoted with $\mathcal{N}_{j}$ .

Let us now define two privacy mechanisms that will turn out to be convenient for the purposes of this paper. It will be sufficient to consider $j_{0},\,j_{1}\in\mathbb{N}$ from now on.

First privacy mechanism

For $i\in\llbracket 1,n\rrbracket$ , $j\in\llbracket j_{0}-1,j_{1}\rrbracket$ , define

[TABLE]

where $W_{ijk}$ are independent Laplace distributed random variables with parameter $1$ ,

[TABLE]

for $j\in\llbracket j_{0},j_{1}\rrbracket$ with $c_{A}=2\lceil A\rceil+1$ .

Second privacy mechanism

For $i\in\llbracket 1,n\rrbracket$ , $j\in\llbracket j_{0}-1,j_{1}\rrbracket$ , define

[TABLE]

where $W_{ijk}$ are independent Laplace distributed random variables with parameter $1$ ,

[TABLE]

for $j\in\llbracket j_{0},j_{1}\rrbracket$ with $c_{A}=2\lceil A\rceil+1$ and some $\nu>1$ .

Note that both privacy mechanisms in (3.1) and (3.2) are non-interactive because $Z_{ijk}$ does not depend on $X_{i^{\prime}}$ for $i^{\prime}\neq i$ . The following proposition shows that both privacy mechanisms, $Z_{i}=(Z_{ijk})_{j\in\llbracket j_{0}-1,j_{1}\rrbracket,k\in\mathcal{N}_{j}}$ satisfy the condition of $\alpha$ -differential privacy.

Proposition 3.1.

The privacy mechanisms given in (3.1) and (3.2) are local $\alpha$ -differential private.

Proof.

By definition of the privacy mechanism in (3.1), the conditional density of $Z_{i}$ given $X_{i}=x_{i}$ can be written as

[TABLE]

Thus, by the reverse and the ordinary triangle inequality,

[TABLE]

Note that for any fixed $x_{i}$ and arbitrary $j$ , $\psi_{jk}(x_{i})\neq 0$ holds only for at most $c_{A}=2\lceil A\rceil+1$ different $k$ , and the same argument is valid for $\varphi_{j_{0}k}(x_{i})$ . Thus,

[TABLE]

For the privacy mechanism (3.2), analogous calculations yield for the conditional density of $Z_{i}$ given $X_{i}=x_{i}$ that

[TABLE]

where we used that $\sum_{j=j_{0}}^{j_{1}}(j\vee 1)^{-\nu}\leq\sum_{j=0}^{\infty}(j\vee 1)^{-\nu}$ and $\sum_{j=2}^{\infty}j^{-\nu}\leq(\nu-1)^{-1}$ . ∎

4. Upper bound for linear wavelet estimators

The expansion (1.4) suggests to consider estimators of the form

[TABLE]

with appropriate estimators $\widehat{\alpha}_{j_{0}k}$ and $\widehat{\beta}_{jk}$ of $\alpha_{j_{0}k}$ and $\beta_{jk}$ , respectively. Note that in the local private framework, estimators of the wavelet coefficients are allowed to depend on the private views $Z_{ijk}$ only but not on the hidden $X_{i}$ . For the results concerning the linear estimator in this section, it suffices to consider the case $j_{0}=0$ . In this case we put $\psi_{-1,k}=\varphi_{0,k}$ and define a linear wavelet estimator through

[TABLE]

Grant to $\mathbb{E}W_{ijk}=0$ , the definition of $\widehat{\beta}_{jk}$ is natural and provides an unbiased estimate of the true wavelet coefficient $\beta_{jk}$ .

The following proposition provides an upper bound for the estimator $\widehat{f}_{\mathrm{lin}}$ in the so-called matched case when $r=p$ . Its proof is given in Appendix B.

Proposition 4.1.

Assume that the father wavelet $\varphi$ satifies Assumption (1.1). Let $1\leq p<\infty$ and $Z_{ijk}$ defined as in (3.1). Then

[TABLE]

In particular, choosing $j_{1}=j_{1}(n,\alpha)$ such that

[TABLE]

we obtain

[TABLE]

The upper bound (4.3) suggests the following interpretation: As long as $\alpha^{2}\geq n^{1/(2s+1)}$ , the estimator $\widehat{f}_{\mathrm{lin}}$ attains the rate $n^{-ps/(2s+1)}$ known to be optimal when the sample $X_{1},\ldots,X_{n}$ is available. However, as soon as $\alpha^{2}<n^{1/(2s+1)}$ , this standard rate is deteriorated and the slower rate $(n\alpha^{2})^{-ps/(2s+2)}$ is attained. As in [DJW18], the alteration of the rate in comparison to the non-private framework concerns both the effective sample size (that changes from $n$ to $n\alpha^{2}$ ) and the exponent appearing in the rate. In contrast to the procedure suggested in [DJW18], however, the privacy mechanism (3.1) consists in a mere perturbation of the empirical wavelet coefficients by Laplace noise, and no further sampling technique is necessary to obtain a privacy channel enabling rate optimal estimation of $f$ .

Although the risk bound of Proposition 4.1 is valid only in the matched case, it can be extended to the case $r\neq p$ by means of the following proposition. Its proof is given in Appendix B.

Corollary 4.1.

Assume that the father wavelet $\varphi$ satifies Assumption (1.1). Let $1\leq p,r<\infty$ and $Z_{ijk}$ defined as in (3.1), and put by $s^{\prime}=s-(1/p-1/r)_{+}$ . Then, choosing $j_{1}$ as in (4.2) yields

[TABLE]

Corollary 4.1 together with Proposition 2.1 shows that the estimator $\widehat{f}_{\mathrm{lin}}$ is of optimal order in the dense homogeneous zone where $p\geq r$ (which is equivalent to $s=s^{\prime}$ ) and for $\alpha$ in $(0,\bar{\alpha}]$ . In analogy to [Don+96], it would be possible to suggest a non-linear estimation procedure depending on $s$ that is optimal (up to logarithmic factors in some cases) in the non-homogeneous dense case and in the sparse case as well. However, in Section 5, we directly propose a non-linear estimator that is adaptive to the smoothness $s$ of the underlying density (as well as to the other parameters $p$ and $q$ of the Besov space).

5. Upper bounds for the non-linear adaptive estimator

In this section, the privacy mechanism is given by (3.2) in Section 3. We study the theoretical properties of the non-linear wavelet estimators of the form

[TABLE]

where

[TABLE]

and $\widehat{\beta}_{jk}=\frac{1}{n}\sum_{i=1}^{n}Z_{ijk}$ as in Section 4 (the choice of $t$ and the value of the numerical constant $K$ are specified in Theorem 5.1 and its proof below). Thus, non-linearity enters only with respect to the estimation of the detail coefficients $\beta_{jk}$ .

Theorem 5.1.

Let the father wavelet $\varphi$ satisfy Assumption 1.1 for some integer $N>0$ . Let the private views $Z_{1},\ldots,Z_{n}$ of the sample $X_{1},\ldots,X_{n}$ be generated with the privacy mechanism in (3.2). Consider the estimator $\widetilde{f}_{n}$ defined in (5.1) with

$\bullet$

$j_{0}\in\mathbb{N}$ * such that $2^{j_{0}}\asymp(n\alpha^{2})^{\frac{1}{2(N+1)+2}}\wedge n^{\frac{1}{2(N+1)+1}}$ ,*

$\bullet$

$j_{1}=j_{1}^{\prime}\wedge j_{1}^{\prime\prime}$ * where $j_{1}^{\prime}$ , $j_{1}^{\prime\prime}\in\mathbb{N}$ are such that*

[TABLE]

$\bullet$

$K=4(\overline{L}+\sigma)$ * for some $\overline{L}>0$ and $\sigma=4c_{A}\|\psi\|_{\infty}\cdot\frac{2\nu-1}{\nu-1}$ with $\nu$ introduced in the definition of the second privacy mechanism,*

$\bullet$

$t=t_{j,n,\alpha}=\gamma\cdot\frac{j^{\nu+1/2}}{\sqrt{n}}\cdot(1\vee\frac{2^{j/2}}{\alpha})$ * for $j\in\llbracket j_{0},j_{1}\rrbracket$ and some sufficiently large constant $\gamma$ (for instance, $\gamma\geq r(N+1)$ works).*

Then, the risk bound

[TABLE]

where

[TABLE]

and where

[TABLE]

for some $0<\underline{L}\leq\overline{L}<\infty$ .

The proof of the Theorem is given in Appendix C. Note that both the privacy mechanism and the estimator in Theorem 5.1 are independent of the quantities $s$ , $p$ , $q$ , and $L$ . Hence, the proposed procedure is adaptive.

6. Discussion

In this article, we have suggested refined methods for density estimation under the constraint of local $\alpha$ -differential privacy. By the use of estimators based on wavelet expansions, we have been able to obtain adaptive procedures that obtain the minimax rate of convergence up to an additional logarithmic factor only. To the best of our knowledge, adaptation to smoothness has not been considered in the framework of private estimation so far. Moreover, in allowing for general $\mathbb{L}^{r}$ -risk and Besov ellipsoids we have widened the range of results in the privacy framework that has merely focused on $\mathbb{L}^{2}$ -risk and Sobolev ellipsoids until now.

A significant difference between our approach and the one suggested in Section 5.2.2 of [DJW18] concerns the privacy mechanism: Whereas the procedure in [DJW18] is built on a rather sophisticated sampling strategy aiming at the perturbation of empirical Fourier coefficients, our privacy mechanism consists in a simple Laplace perturbation of empirical wavelet coefficients. In [DJW18] it has been observed (see the last paragraph of Section 5.2.2 in that paper) that such an approach is not feasible for the Fourier basis since it would lead to a suboptimal rate (under $\mathbb{L}^{2}$ -risk) of order $(n\alpha^{2})^{-2s/(2s+3)}$ over Sobolev ellipsoids of smoothness $s$ instead of the optimal rate $(n\alpha^{2})^{-s/(s+1)}$ . A heuristic explanation for the easier accessibility of the problem by means of wavelet bases is given by their well-known localisation properties in contrast to the global Fourier basis.

Note that wavelet methods in the non-private framework do not necessarily suffer from a logarithmic loss in the rate (see, for instance, [Don+96] where an additional logarithmic loss only appears in the dense zone). The fact that we encounter this type of loss in our private scenario is caused be the term $j^{\nu}$ in the definition of the privacy mechanism (3.2) and might be explained by the pointwise nature of the $\alpha$ -differential privacy constraint. The problem whether and if so, how such logarithmic losses might be circumvented remains open and provides an interesting direction for future research.

Finally, let us sketch the connection between local private estimation in the non-interactive setup and statistical inverse problems, in particular, density deconvolution: On the one hand, in density deconvolution, the statistician is given a noisy sample $Z_{1},\ldots,Z_{n}$ where $Z_{i}=X_{i}+\varepsilon_{i}$ for $X_{i}\sim f$ and $\varepsilon_{i}\sim q$ . Here, the density $f$ is the quantity of interest and $q$ an error density which is (at least in the overwhelming part of the literature) supposed to be known. In this setup, the $Z_{i}$ are distributed according to the density $g$ where

[TABLE]

is the convolution of $f$ with the error density $q$ . It is well-known that the difficulty of reconstructing $f$ from the sample $Z_{1},\ldots,Z_{n}$ is linked with the degree of ill-posedness of the inverse problem $g=K_{q}f$ . The latter can be described either in terms of the sequence $(\lambda_{k}^{2})$ of eigenvalues of $K_{q}^{\ast}K_{q}$ ( $K_{q}^{\ast}$ denotes the adjoint operator of the linear operator $K_{q}$ ) or in terms of the decay of the Fourier transform of the error density $q$ . General inverse problems of the form $Kf=g$ have been thoroughly investigated in [Ker+07] in the framework of a Gaussian white noise model. For Besov smooth signals $f$ and $\lvert\lambda_{k}\rvert\asymp k^{-\rho}$ for some $\rho>0$ , [Ker+07] derived adaptive rates of estimation of $f$ proportional to

[TABLE]

On the other hand, the statistician who is given the non-interactive privatised sample $Z_{1},\ldots,Z_{n}$ is confronted with the problem of recovering $f$ from a sample from the mixture density

[TABLE]

which is a special instance of an inverse problem and strongly resembles (6.1). In contrast to (6.1), however, the operator $K$ is now not a priori given as a component of the problem but constitutes rather a part of its solution. In the local differential privacy framework, the statistician should select the operator $K$ , corresponding to the choice of a privacy mechanism, subject to the two following constraints. First, the condition (1.1) concerning $\alpha$ -differential privacy must hold. Second, the least possible amount of information should be smoothed out by the operator $K$ . More precisely, denoting with $\rho$ the degree of ill-posedness as above, the proofs of the lower bounds suggest that the least admissible value for $\rho$ is $1/2$ . Our privacy mechanisms, that is, our choices of $K$ satisfy both constraints by leading to an overall estimation procedure that is nearly minimax. We emphasize that the above interpretation of the local differential private estimation problem does not rule out privacy mechanisms that add noise directly to the random variables $X_{1},\ldots,X_{n}$ in principle. As already mentioned, [DJW18] have noted that adding Laplace noise directly to the observations cannot lead to an optimal procedure. Indeed, the convolution operator in this case has degree of ill-posedness corresponding to $\rho=2$ which yields a too slow rate.

Appendix A Proofs of Section 2

We distinguish in the sequel the dense case and the sparse case that require different explicit constructions. However, for both proofs of the lower bounds we need the existence of a function $f_{0}$ with the following properties (see [Här+98]):

$\bullet$

$f_{0}$ is a probability density,

$\bullet$

$J_{spq}(f_{0})\leq L/2$ ,

$\bullet$

$\mathrm{supp}(f_{0})\subseteq[-T,T]$ ,

$\bullet$

$f_{0}\equiv c_{0}>0$ on some interval $[a,b]$ .

In particular, $f_{0}\in\mathcal{D}^{s}_{pq}(L/2,T)$ .

The main tool in the proof of the lower bounds is adapted from [DJW18]. It allows to reduce the problem to the study of the likelihoods of the non-privatized data and quantifies the loss of information in the process.

Suppose that we are given a finite indexed family of distributions $\{P_{\nu},\nu\in\mathcal{V}\}$ . Let $V$ denote a random variable that is uniformly distributed over $\mathcal{V}$ . Conditionally on $V=\nu$ , suppose we sample a random vector $(X_{1},\ldots,X_{n})$ according to the product measure $P_{\nu}^{\otimes n}=P_{\nu}\otimes\ldots\otimes P_{\nu}$ . Suppose that we draw an $\alpha$ -locally private sample $Z_{1},\ldots,Z_{n}$ according to a channel $Q$ . Conditioned on $V=\nu$ , $(Z_{1},\ldots,Z_{n})$ is distributed according to the measure $M_{\nu}^{n}$ given by

[TABLE]

where $Q^{n}(\cdot\mid x_{1},\ldots,x_{n})$ denotes the joint distribution on $\mathcal{Z}^{n}$ of the private sample $Z_{1:n}$ conditioned on $X_{1:n}=x_{1:n}$ . In this setup, we have the following inequality.

Lemma A.1.

[Based on [DJW18], Theorem 1] Let $\alpha\geq 0$ . For any $\alpha$ -locally differentially private conditional distribution $Q$ and any $\nu,\nu^{\prime}\in\mathcal{V}$ , $\nu\neq\nu^{\prime}$ , we have in the above setting

[TABLE]

Lemma A.1 quantifies the property that $\alpha$ -differential privacy acts as a contraction on the space of probability measures.

A.1. Proof of Proposition 2.1

It is sufficient to prove the lower bound for $n$ sufficiently large (the remainining finitely many $n$ might merely further reduce the value of the numerical constant $C$ ). Let $f_{0}$ be the function introduced above. For fixed $j$ (the choice of which will be specified later) define $\mathcal{I}_{j}$ as the maximal subset of $\mathbb{Z}$ such that $\mathrm{supp}(\psi_{jk})\subset[a,b]$ and $\mathrm{supp}(\psi_{jk})\cap\mathrm{supp}(\psi_{jk^{\prime}})=\emptyset$ if $k,k^{\prime}\in\mathcal{I}_{j}$ with $k\neq k^{\prime}$ . Note that $N_{j}\vcentcolon=|\mathcal{I}_{j}|\asymp 2^{j}$ . Define

[TABLE]

where $\gamma=c(n(e^{\alpha}-1)^{2})^{-\frac{2s+1}{2(2s+2)}}$ for $c$ sufficiently small and $2^{j}\asymp(n(e^{\alpha}-1)^{2})^{\frac{1}{2s+2}}$ . For $c$ sufficiently small, it holds $\gamma 2^{j/2}\|\psi\|_{\infty}\leq c_{0}$ , which ensures that $f_{\theta}$ is non-negative for all $\theta\in\Theta$ . One can easily check that $\int f_{\theta}=1$ and $\mathrm{supp}(f_{\theta})\subseteq[-T,T]$ for all $\theta\in\Theta$ . Moreover, by the definition of $\gamma$ , the choice of $j$ and the equivalence of norms, we have

[TABLE]

where the last inequality holds for $c$ sufficiently small. Hence, $\mathcal{F}\subset\mathcal{D}^{s}_{pq}(L,T)$ and

[TABLE]

Denoting by $\Delta_{jk}$ the support of $\psi_{jk}$ , it holds for any estimator $\widetilde{f}$ of $f$ that

[TABLE]

since $f_{\theta}\equiv g_{\theta_{k}}:=f_{0}+\gamma\theta_{k}\psi_{jk}$ on $\Delta_{jk}$ . Set

[TABLE]

and $\check{\theta}_{k}=\operatornamewithlimits{argmin}_{\theta\in\{0,1\}}\|\widetilde{f}-g_{\theta}\|_{r,\Delta_{jk}}$ . It follows from the triangle inequality that

[TABLE]

Thus,

[TABLE]

where $d_{H}$ denotes the Hamming distance. Therefore,

[TABLE]

In order to apply Lemma A.2, we need to bound the Kullback-Leibler divergence between two different distributions $M^{n}_{\theta}$ and $M_{\theta^{\prime}}^{n}$ of the private sample $(Z_{1},\ldots,Z_{n})$ resulting from the sample $X_{1},\ldots,X_{n}$ if, for all $i\in\llbracket 1,n\rrbracket$ , $X_{i}$ is distributed according to $f_{\theta}$ , $f_{\theta^{\prime}}$ with $d_{H}(\theta,\theta^{\prime})=1$ . We write $X_{i}\sim\mathbb{P}_{\theta}$ if $X_{i}$ has density $f_{\theta}$ . Using Lemma A.1 we obtain for any channel providing local $\alpha$ -differential privacy that

[TABLE]

Now, since $d_{H}(\theta,\theta^{\prime})=1$ and $\theta,\theta^{\prime}\in\Theta$ , there exists $k_{0}\in\mathcal{I}_{j}$ such that

[TABLE]

which implies that

[TABLE]

Applying Lemma A.2 from the appendix with $N=N_{j}\gtrsim 2^{j}$ implies

[TABLE]

This implies the statement of the proposition since $\widetilde{f}$ and the channel distribution were arbitrary.

A.2. Proof of Proposition 2.2

We consider $f_{0},\psi,\mathcal{I}_{j}$ and $N_{j}$ as in the proof of Proposition 2.1, but consider now the set

[TABLE]

where $j$ is chosen such that $2^{j}\simeq\big{(}\frac{n(e^{\alpha}-1)^{2}}{\log(n(e^{\alpha}-1)^{2})}\big{)}^{\frac{1}{2(s+1-1/p)}}$ and $\gamma=c2^{-j(s+1/2-1/p)}$ for $c$ sufficiently small. Let us first check that this choice of $j$ and $\gamma$ guarantees that $\mathcal{F}\subset\mathcal{D}^{s}_{pq}(L,T)$ . First, we have $f_{0}\in\mathcal{D}^{s}_{pq}(L,T)$ and one can easily check that $\int f_{k}=1$ and $\mathrm{supp}(f_{k})\subseteq[-T,T]$ for all $k\in\mathcal{I}_{j}$ . Then, for any $k\in\mathcal{I}_{j}$ , we have

[TABLE]

for $c$ sufficiently small. Furthermore, for any $k\in\mathcal{I}_{j}$ ,

[TABLE]

for $c$ sufficiently small. Hence, $\mathcal{F}\subset\mathcal{D}^{s}_{pq}(L,T)$ and

[TABLE]

Now, we show that for $k,k^{\prime}\in\mathcal{I}_{j}$ , $k\neq k^{\prime}$ , the hypotheses $f_{k}$ and $f_{k^{\prime}}$ , as well as the hypotheses $f_{k}$ and $f_{0}$ , are sufficiently separated in the sense of Lemma A.3. For such $k,k^{\prime}$ we have:

[TABLE]

For $k\in\{0\}\cup\mathcal{I}_{j}$ , let $M_{k}^{n}$ be the distribution of the private sample $(Z_{1},\ldots,Z_{n})$ resulting from the sample $X_{1},\ldots,X_{n}$ if for all $i\in\llbracket 1,n\rrbracket$ $X_{i}$ is distributed according to $f_{k}$ . For all $k\in\mathcal{I}_{j}$ we have $M_{k}^{n}\ll M_{0}^{n}$ . It remains to bound the quantity $\frac{1}{N_{j}}\sum_{k\in\mathcal{I}_{j}}\mathrm{KL}(M_{k}^{n},M_{0}^{n}).$ We write $X_{i}\sim\mathbb{P}_{k}$ if $X_{i}$ has density $f_{k}$ , $k\in\{0\}\cup\mathcal{I}_{j}$ . First consider the total variation distance between $\mathbb{P}_{k}$ and $\mathbb{P}_{0}$ for $k\in\mathcal{I}_{j}$ :

[TABLE]

and thus

[TABLE]

Applying Lemma A.1 gives

[TABLE]

Now, $\log(N_{j})\geq\log(C2^{j})$ and

[TABLE]

for $n$ sufficiently large, say $n\geq n_{0}$ . Putting this estimate into (A.1) yields

[TABLE]

for $n\geq n_{0}$ and $C<1/8$ for $c$ sufficiently small. We can then apply Lemma A.3, which yields for $n\geq n_{0}$ that

[TABLE]

The statement of the proposition follows since both the estimator $\widetilde{f}$ and the privacy mechanism considered were arbitrary.

A.3. Further auxiliary results for the lower bound proofs

The following lemma is a Kullback-Leibler version of Assouad’s lemma. As above, we denote by $d_{H}$ the Hamming distance, that is, $d_{H}(\theta,\theta^{\prime})=\sum_{i=1}^{d}\mathds{1}_{\{\theta_{i}\neq\theta_{i}^{\prime}\}}$ for $\theta,\theta^{\prime}\in\mathbb{R}^{d}$ .

Lemma A.2 ([Tsy09], p. 118, Theorem 2.12).

Denote with $\Theta=\{0,1\}^{N}$ the set of all binary sequences of length $N$ . Let $\{\mathbb{P}_{\theta}:\theta\in\Theta\}$ be a set of $2^{N}$ probability measures on some measurable space $(\mathcal{X},\mathscr{A})$ and let the corresponding expectations be denoted by $\mathbb{E}_{\theta}$ . Then

[TABLE]

provided that $\mathrm{KL}(\mathbb{P}_{\theta},\mathbb{P}_{\theta^{\prime}})\leq\beta<\infty$ for all $\theta,\theta^{\prime}\in\Theta$ with $d_{H}(\theta,\theta^{\prime})=1$ .

For the lower bound in the sparse case we need the following lemma taken from [Tsy09].

Lemma A.3 ([Tsy09], p. 101, Theorem 2.7).

Assume that $M\geq 1$ and suppose that $\Theta$ contains elements $\theta_{0},\theta_{1},\ldots,\theta_{M}$ such that:

(i)

$d(\theta_{j},\theta_{k})\geq 2\Psi>0$ , for all $0\leq j<k\leq M$ , 2. (ii)

$\mathbb{P}_{j}\ll\mathbb{P}_{0}$ , for all $j=1,\ldots,M$ , and

[TABLE]

with $0<\beta<1/8$ and $\mathbb{P}_{j}=\mathbb{P}_{\theta_{j}}$ , $j=0,1,\ldots,M$ . Then

[TABLE]

Appendix B Proofs of Section 4

B.1. Proof of Proposition 4.1

We give the proof for $p>2$ only, which is based on Statement (ii) from Lemma B.1. The proof for $1\leq p\leq 2$ follows similarly using (i) instead. We decompose the risk of the estimator $\widehat{f}_{\mathrm{lin}}$ into approximation and stochastic error:

[TABLE]

The approximation term can be dealt with exactly as in the case of non-private data (see [Här+98], p. 130),

[TABLE]

and it remains to consider the stochastic term. Putting $\beta^{\prime}_{-1,k}=\frac{1}{n}\sum_{i=1}^{n}\varphi(X_{i}-k)$ and $\beta^{\prime}_{jk}=\frac{1}{n}\sum_{i=1}^{n}\psi_{jk}(X_{i})$ , we have

[TABLE]

which can be rewritten as

[TABLE]

where $K_{j}(x,y)=2^{j}\sum_{k}\varphi(2^{j}x-k)\bar{\varphi}(2^{j}y-k)$ . We further decompose

[TABLE]

The first term on the right-hand side is analysed as in the non-private setup (see [Här+98], p. 130) leading to the bound

[TABLE]

For the remaining term, we have by Tonelli’s theorem

[TABLE]

where $\Delta$ is some compact set the length of which depends on $A$ and $T$ only. The expectation inside the integral is bounded from above by Rosenthal’s inequality (Statement (ii) of Lemma B.1):

[TABLE]

Recall the definition of $\psi_{jk}$ and noting that grant to the boundedness of the support of the wavelet parents $\varphi$ and $\psi$ we have for any $x$ and fixed $j$ that $\psi_{jk}(x)\neq 0$ only for a finite number of $k$ that is independent of $j$ . Thus, using the last expression we bound from above as follows

[TABLE]

Thus,

[TABLE]

Combining (B.1) and (B.2) yields

[TABLE]

which proves (4.1). Choosing $j_{1}=j_{1}(n,\alpha)$ as in (4.2) immediately implies (4.3).

B.2. Proof of Corollary 4.1

We distinguish between the cases $p\geq r$ and $p<r$ .

1. Case: $p>r$

In this case, $s^{\prime}=s$ . Let us consider the estimator $\widehat{f}_{\mathrm{lin}}$ with $j_{1}$ chosen as in Proposition 4.1. First note that there exists a constant $\bar{C}>0$ such that the Lebesgue measure of $\mathrm{supp}(\widehat{f}_{\mathrm{lin}}-f))$ is bounded from above by a constant $\bar{C}>0$ . Then, applying Hölder’s inequality and Proposition 4.1 yields

[TABLE]

2. Case: $p\leq r$

In this case, $s^{\prime}=s-1/p+1/r$ . Grant to the Besov embedding it holds $\mathcal{B}^{s}_{pq}\subset\mathcal{B}^{s^{\prime}}_{rq}$ , which implies $\mathcal{D}^{s}_{pq}\subset\mathcal{D}^{s^{\prime}}_{rq}$ . Thus, again using the upper bound for the matched case from Proposition 4.1,

[TABLE]

which is the desired bound for the case $p\leq r$ .

B.3. Inequalities for moments of sums of independent random variables

Lemma B.1.

Let $X_{1},\ldots,X_{n}$ be independent centred random variables with $\mathbb{E}[|X_{i}|^{r}]<\infty$ .

(i)

If $0<r\leq 2$ , then

[TABLE] 2. (ii)

If $r>2$ , then there exists a constant $C=C(r)$ such that

[TABLE]

Inequality (i) follows directly from Jensen’s inequality and concavity of $x\mapsto x^{r/2}$ for $0<r\leq 2$ . For a proof of inequality (ii) we refer to [Pet95], p. 59, Theorem 2.9.

Appendix C Proof of Theorem 5.1

This section is devoted to the proof of Theorem 5.1. The main reasoning is given in Section C.1 but some tedious calculations for this proof are deferred to Section C.2. Sections C.3 and C.4 contain auxiliary results used in Section C.2.

C.1. Proof of Theorem 5.1

As in the proof of the Corollary 4.1, we note that it is sufficient to prove the result for $p\leq r$ and one can deduce the result for $p>r$ as in the proof of this corollary.

We consider the upper bound $\mathbb{E}\lVert\widetilde{f}_{n}-f\rVert_{r}^{r}\leq 3^{r-1}(\mathbb{E}\lVert A\rVert_{r}^{r}+\mathbb{E}\lVert B\rVert_{r}^{r}+\lVert C\rVert_{r}^{r})$ where

[TABLE]

We consider the risk bounds for $\mathbb{E}\|A\|_{r}^{r}$ , $\mathbb{E}\|B\|_{r}^{r}$ , and $\|C\|_{r}^{r}$ separately.

Upper bound for the term $\mathbb{E}\|A\|_{r}^{r}$ :

Putting $\alpha^{\prime}_{j_{0}k}=\frac{1}{n}\sum_{i=1}^{n}\varphi_{j_{0}k}(X_{i})$ it holds

[TABLE]

The first term on the right-hand side is bounded by the compact support assumption on $\varphi$ and using Lemma 1 from [Don+96] as in the non-private case (see [Don+96], p. 522):

[TABLE]

Concerning the second term, first, by Fubini’s theorem

[TABLE]

and the integrand on the right-hand side can be bounded as follows: for $r>2$ ,

[TABLE]

whereas for $r\leq 2$ ,

[TABLE]

Thus, altogether,

[TABLE]

Hence, for our choice of $j_{0}$ and grant to $s<N+1$ from Assumption 1.1, we obtain

[TABLE]

and the bound on the right-hand side is the claimed rate.

Upper bound for the term $\mathbb{E}\lVert B\rVert_{r}^{r}$ :

We consider the sets

[TABLE]

and the decomposition

[TABLE]

Appropriate bounds for the four terms $e_{bs},e_{bb},e_{sb},e_{ss}$ are derived in Appendix C.2.

Upper bound for the term $\lVert C\rVert_{r}^{r}$ :

In the case we consider, $p\leq r$ , we use the embedding $\mathcal{B}^{s}_{pq}\subset\mathcal{B}^{s^{\prime}}_{r\infty}$ , where we recall that $s^{\prime}=s-\frac{1}{p}+\frac{1}{r}$ . Then, it holds

[TABLE]

Moreover, with our choice of $j_{1}$ ,

[TABLE]

and the sum on the right-hand side is bounded from above by the claimed rate.

C.2. Bounds for the terms $e_{bs}$ , $e_{bb}$ , $e_{sb}$ , and $e_{ss}$

Consider the event $A_{jk}$ defined via $A_{jk}=\{\lvert\widehat{\beta}_{jk}-\beta_{jk}\rvert>K/2\cdot t_{j,n,\alpha}\}$ . The concentration inequality (C.5) for this event as well as the bound (C.6) will be used frequently in the sequel without further reference. In the following, we bound the terms $\mathbb{E}\lVert e_{bs}\rVert_{r}^{r}$ , $\mathbb{E}\lVert e_{bb}\rVert_{r}^{r}$ , $\mathbb{E}\lVert e_{sb}\rVert_{r}^{r}$ , and $\mathbb{E}\lVert e_{ss}\rVert_{r}^{r}$ separately.

C.2.1. Bound for $e_{bs}$

By the Cauchy-Schwarz inequality and the fact that $\widehat{B}_{j}\cap S_{j}\subset A_{jk}$ ,

[TABLE]

and this term is bounded from above by the claimed rate provided that $\gamma\geq 2r$ .

C.2.2. Bound for $e_{sb}$

Using the relation $\widehat{S}_{j}\cap B^{\prime}_{j}\subset A_{jk}$ , we obtain

[TABLE]

In the considered case $p\leq r$ , we exploit the embedding $\mathcal{B}^{s}_{pq}\subseteq\mathcal{B}^{s^{\prime}}_{rq}$ with $s^{\prime}=s-\frac{1}{p}+\frac{1}{r}$ to get the bound

[TABLE]

Hence,

[TABLE]

by the definition of $j_{0}$ . Noting that

[TABLE]

provided that $\gamma$ is large enough ( $\gamma\geq r(N+1)$ is sufficient), shows that $\mathbb{E}\lVert e_{sb}\rVert_{r}^{r}$ is at most of the same order as the claimed rate.

C.2.3. Bound for $e_{bb}$

Put $t_{j,n,\alpha}^{\prime}=\gamma j^{\nu+1/2}n^{-1/2}$ and $t_{j,n,\alpha}^{\prime\prime}=\gamma j^{\nu+1/2}(n\alpha^{2})^{-1/2}2^{j/2}$ . Note that $t_{j,n,\alpha}=t_{j,n,\alpha}^{\prime}\vee t_{j,n,\alpha}^{\prime\prime}$ . For any $\rho\geq 0$ , it holds

[TABLE]

As this argument shows, one can even choose distinct values of $\rho$ for different $j$ , which will be used in the following calculations. Note that

[TABLE]

and, if $\rho\leq p$ , by Hölder’s inequality

[TABLE]

In the sequel, we consider three different cases corresponding to the three regimes in the statement of Theorem 5.1.

1. Case: $p>r/(s+1)$

$\bullet$

Bound for $S_{1}$ : Set $q_{1}=r/(2s+1)$ and define $\kappa_{1}\in\mathbb{N}$ such that

[TABLE]

Choosing $\rho<q_{1}\leq p$ for the indices $j\in\llbracket j_{0},\kappa\rrbracket$ , we obtain (note that $s+1/2=r/(2q_{1})$ )

[TABLE]

Choosing $\rho=p$ for indices $j\in\llbracket\kappa_{1}+1,j_{1}\rrbracket$ , we obtain

[TABLE]

$\bullet$

Bound for $S_{2}$ : Set $q_{2}=r/(s+1)$ and define $\kappa_{2}\in\mathbb{N}$ such that

[TABLE]

Choosing $\rho<q_{2}\leq p$ for the indices $j\in\llbracket j_{0},\kappa_{2}\rrbracket$ , we obtain (note that $s+1=r/q_{2}$ )

[TABLE]

Choosing $\rho=p$ for indices $j\in\llbracket\kappa_{2}+1,j_{1}\rrbracket$ , we obtain

[TABLE]

2. Case: $p\in(r/(2s+1),r/(s+1)]$

$\bullet$

Bound for $S_{1}$ : The sum $S_{1}$ can be dealt with as in the first case, since the choices of $q_{1}$ and $\kappa_{1}$ from that case are still legitimated for $p\in(r/(2s+1),r/(s+1)]$ .

$\bullet$

Bound for $S_{2}$ : In order to bound $S_{2}$ in the second case, define $q_{2}$ and $\kappa_{2}$ via the relations

[TABLE]

To deal with the sum over $j\in\llbracket j_{0},\kappa_{2}\rrbracket$ , we take $\rho=p$ and obtain

[TABLE]

For the sum over indices $j\in\llbracket\kappa_{2}+1,j_{1}\rrbracket$ , we choose $\rho>q_{2}>p$ , and obtain by monotony of $\ell^{\omega}$ -norms in $\omega$ , and putting $s^{\prime\prime}=s+1/2-1/p$ that

[TABLE]

3. Case: $p\leq r/(2s+1)$

$\bullet$

Bound for $S_{1}$ : Put

[TABLE]

and choose $\kappa_{1}\in\mathbb{N}$ such that

[TABLE]

Then, taking $\rho=p$ for the indices $j\in\llbracket j_{0},\kappa_{1}\rrbracket$ in the first sum in (C.1), we obtain

[TABLE]

For the sum over indices $j\in\llbracket\kappa_{1}+1,j_{1}\rrbracket$ , we choose $\rho>q_{1}>p$ , and obtain by monotony of $\ell^{\omega}$ -norms in $\omega$ and putting $s^{\prime\prime}=s+1/2-1/p$ that

[TABLE]

$\bullet$

Bound for $S_{2}$ : $S_{2}$ can be dealt with exactly as in the second case.

C.2.4. Bound for $e_{ss}$

For any $0\leq\rho\leq r$

[TABLE]

This term can be bounded from above by the right-hand side of (C.1), and we conclude in the same way as for the term $e_{bb}$ .

C.3. A concentration inequality for the $\widehat{\beta}_{jk}$

For our proof, we need concentration inequalities for the events

[TABLE]

for $K>0$ , where $j\in\llbracket j_{0},j_{1}\rrbracket$ and $k\in\mathcal{N}_{j}$ . Let recall the two-sided Bernstein’s inequality (cf. [BLM13] Theorem 2.10).

Theorem C.1.

Let $Y_{1},\ldots,Y_{n}$ be independent real valued random variables. Assume that there exist some positive numbers $v$ and $c$ such that

[TABLE]

and for all integers $m\geq 3$

[TABLE]

Let $S=\sum_{i=1}^{n}(Y_{i}-\mathbb{E}[Y_{i}])$ , then for every positive $x$

[TABLE]

Using this inequality, we can prove the following result.

Proposition C.1.

For all $j\in\llbracket j_{0},j_{1}\rrbracket$ satisfying $j\leq n$ , for all $k\in\mathcal{N}_{j}$ , and for all $\gamma\geq 1$ we have

[TABLE]

where $\bar{c}$ is an upper bound for $\sup_{f\in\mathcal{D}^{s}_{pq}(L,T)}\|f\|_{\infty}$ and $\sigma=4c_{A}\|\psi\|_{\infty}(2\nu-1)/(\nu-1)$ appears in the privacy mechanism (3.2).

*Remark C.2**.*

By Equation (15) in [Don+96], the choice $\bar{c}=L$ is admissible for $f\in\mathcal{D}^{s}_{pq}(L,T)$ .

Proof.

We will apply Bernstein’s inequality to the random variables $\{Z_{ijk}\}_{i=1,\ldots,n}$ . Using that $\psi_{jk}(X_{i})$ and $W_{ijk}$ are independent and that $\mathbb{E}[W_{ijk}]=0$ , we get for all $i\in\llbracket 1,n\rrbracket$

[TABLE]

where $\bar{c}>0$ depends on $L$ is such that $\|f\|_{\infty}\leq\bar{c}$ for all $f$ in $\mathcal{B}^{s}_{pq}(L)$ with $s>\frac{1}{p}$ . Let $m\geq 3$ be an integer. Using again that $\psi_{jk}(X_{i})$ and $W_{ijk}$ are independent we get for all $i\in\llbracket 1,n\rrbracket$

[TABLE]

Conditions (C.2) and (C.3) are thus satisfied with $v=2n(\bar{c}+\sigma_{j})^{2}$ and $c=\bar{c}+\sigma_{j}$ , and according to Bernstein’s inequality (C.4) we have for all $x>0$

[TABLE]

Note that we have for all $j\in\llbracket j_{0},j_{1}\rrbracket$ ,

[TABLE]

where $\sigma=4c_{A}\|\psi\|_{\infty}(2\nu-1)/(\nu-1)$ appears in the definition of $\sigma_{j}$ in (3.2). Take $x=Cj$ , $C>0$ and note that $2\sqrt{Cj/n}+Cj/n\leq(2\sqrt{C}+C)\sqrt{j/n}$ if $j\leq n$ . Consequently, we get for all $C>0$ , for all $j\in\llbracket j_{0},j_{1}\rrbracket$ satisfying $j\leq n$ and for all $k\in\mathcal{N}_{j}$ ,

[TABLE]

Then, it suffices to take $C=2\ln(2)\gamma$ to obtain (C.5). ∎

C.4. Moment bounds and norm inequalities

Consider an arbitrary random function

[TABLE]

Putting

[TABLE]

it has been shown in [Don+96] that for arbitrary $v\in\mathbb{R}$ and $u=r/(r-2)$ it holds

[TABLE]

As in [Don+96], adopting the formal convention $S^{0}=1$ , it suffices to consider the second inequality for all $r\geq 1$ (setting $v=0$ for the case $r\leq 2$ ). Thus, for any $r\geq 1$ ,

[TABLE]

Consider again the decomposition $\widehat{\beta}_{jk}=\beta_{jk}^{\prime}+\frac{\sigma_{j}}{n}\sum_{i=1}^{n}W_{ijk}$ . We have, for any $m\geq 1$ ,

[TABLE]

In [Don+96], p. 520, Equation (16) it is shown that

[TABLE]

provided that $2^{j}\leq n$ with a constant $c$ depending only on $s$ , $p$ , $q$ , $L$ , $\|\psi\|_{m}$ and $m$ . In addition, by Rosenthal’s inequality, it can be shown for any $m\geq 1$ that

[TABLE]

Combining (C.7) and (C.8) yields

[TABLE]

Acknowledgements

The authors gratefully acknowledge financial support from GENES. Cristina Butucea and Martin Kroll also gratefully acknowledge financial support from the French National Research Agency (ANR) under the grant Labex Ecodec (ANR-11-LABEX-0047).

Bibliography13

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[BLM 13] Stéphane Boucheron, Gábor Lugosi and Pascal Massart “Concentration inequalities” A nonasymptotic theory of independence, With a foreword by Michel Ledoux Oxford University Press, Oxford, 2013, pp. x+481 DOI: 10.1093/acprof:oso/9780199535255.001.0001 · doi ↗
2[DJW 13] John C. Duchi, Michael I. Jordan and Martin J. Wainwright “Local privacy and minimax bounds: sharp rates for probability estimation”, 2013 URL: https://arxiv.org/abs/1305.6000
3[DJW 18] John C. Duchi, Michael I. Jordan and Martin J. Wainwright “Minimax optimal procedures for locally private estimation” In J. Amer. Statist. Assoc. 113.521 , 2018, pp. 182–201 DOI: 10.1080/01621459.2017.1389735 · doi ↗
4[DL 91] David L. Donoho and Richard C. Liu “Geometrizing rates of convergence. II, III” In Ann. Statist. 19.2 , 1991, pp. 633–667 \bibrangessep 668–701 DOI: 10.1214/aos/1176348114 · doi ↗
5[Don+96] David L. Donoho, Iain M. Johnstone, Gérard Kerkyacharian and Dominique Picard “Density estimation by wavelet thresholding” In Ann. Statist. 24.2 , 1996, pp. 508–539 DOI: 10.1214/aos/1032894451 · doi ↗
6[GN 16] Evarist Giné and Richard Nickl “Mathematical foundations of infinite-dimensional statistical models”, Cambridge Series in Statistical and Probabilistic Mathematics, [40] Cambridge University Press, New York, 2016, pp. xiv+690 DOI: 10.1017/CBO 9781107337862 · doi ↗
7[Här+98] Wolfgang Härdle, Gerard Kerkyacharian, Dominique Picard and Alexander Tsybakov “Wavelets, approximation, and statistical applications” 129 , Lecture Notes in Statistics Springer-Verlag, New York, 1998, pp. xviii+265 DOI: 10.1007/978-1-4612-2222-4 · doi ↗
8[HRW 13] Rob Hall, Alessandro Rinaldo and Larry Wasserman “Differential privacy for functions and functional data” In J. Mach. Learn. Res. 14 , 2013, pp. 703–727

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Local differential privacy: Elbow effect in optimal density estimation and adaptation over Besov ellipsoids

Abstract.

Key words and phrases:

2010 Mathematics Subject Classification:

1. Introduction

Problem statement

Local differential private estimation

Definition 1.1**.**

Rate optimal density estimation over Besov ellipsoids

Assumption 1.1**.**

Minimax framework under privacy constraints

Related work

Main results

Notational conventions

2. Lower bounds

Proposition 2.1**.**

Proposition 2.2**.**

Corollary 2.1**.**

3. Privacy mechanisms

First privacy mechanism

Second privacy mechanism

Proposition 3.1**.**

Proof.

4. Upper bound for linear wavelet estimators

Proposition 4.1**.**

Corollary 4.1**.**

5. Upper bounds for the non-linear adaptive estimator

Theorem 5.1**.**

6. Discussion

Appendix A Proofs of Section 2

Lemma A.1**.**

A.1. Proof of Proposition 2.1

A.2. Proof of Proposition 2.2

A.3. Further auxiliary results for the lower bound proofs

Lemma A.2** ([Tsy09], p. 118, Theorem 2.12).**

Lemma A.3** ([Tsy09], p. 101, Theorem 2.7).**

Appendix B Proofs of Section 4

B.1. Proof of Proposition 4.1

B.2. Proof of Corollary 4.1

1. Case: p>rp>rp>r

2. Case: p≤rp\leq rp≤r

B.3. Inequalities for moments of sums of independent random variables

Lemma B.1**.**

Appendix C Proof of Theorem 5.1

C.1. Proof of Theorem 5.1

Upper bound for the term E∥A∥rr\mathbb{E}\|A\|_{r}^{r}E∥A∥rr​:

Upper bound for the term E∥B∥rr\mathbb{E}\lVert B\rVert_{r}^{r}E∥B∥rr​:

Upper bound for the term ∥C∥rr\lVert C\rVert_{r}^{r}∥C∥rr​:

C.2. Bounds for the terms ebse_{bs}ebs​, ebbe_{bb}ebb​, esbe_{sb}esb​, and esse_{ss}ess​

C.2.1. Bound for ebse_{bs}ebs​

C.2.2. Bound for esbe_{sb}esb​

C.2.3. Bound for ebbe_{bb}ebb​

1. Case: p>r/(s+1)p>r/(s+1)p>r/(s+1)

2. Case: p∈(r/(2s+1),r/(s+1)]p\in(r/(2s+1),r/(s+1)]p∈(r/(2s+1),r/(s+1)]

3. Case: p≤r/(2s+1)p\leq r/(2s+1)p≤r/(2s+1)

C.2.4. Bound for esse_{ss}ess​

C.3. A concentration inequality for the β^jk\widehat{\beta}_{jk}β​jk​

Theorem C.1**.**

Proposition C.1**.**

Remark C.2*.*

Proof.

C.4. Moment bounds and norm inequalities

Acknowledgements

Definition 1.1.

Assumption 1.1.

Proposition 2.1.

Proposition 2.2.

Corollary 2.1.

Proposition 3.1.

Proposition 4.1.

Corollary 4.1.

Theorem 5.1.

Lemma A.1.

Lemma A.2 ([Tsy09], p. 118, Theorem 2.12).

Lemma A.3 ([Tsy09], p. 101, Theorem 2.7).

1. Case: $p>r$

2. Case: $p\leq r$

Lemma B.1.

Upper bound for the term $\mathbb{E}\|A\|_{r}^{r}$ :

Upper bound for the term $\mathbb{E}\lVert B\rVert_{r}^{r}$ :

Upper bound for the term $\lVert C\rVert_{r}^{r}$ :

C.2. Bounds for the terms $e_{bs}$ , $e_{bb}$ , $e_{sb}$ , and $e_{ss}$

C.2.1. Bound for $e_{bs}$

C.2.2. Bound for $e_{sb}$

C.2.3. Bound for $e_{bb}$

1. Case: $p>r/(s+1)$

2. Case: $p\in(r/(2s+1),r/(s+1)]$

3. Case: $p\leq r/(2s+1)$

C.2.4. Bound for $e_{ss}$

C.3. A concentration inequality for the $\widehat{\beta}_{jk}$

Theorem C.1.

Proposition C.1.

*Remark C.2**.*