Pointwise adaptive kernel density estimation under local approximate   differential privacy

Martin Kroll

arXiv:1907.06233·math.ST·July 16, 2019

Pointwise adaptive kernel density estimation under local approximate differential privacy

Martin Kroll

PDF

Open Access

TL;DR

This paper develops a pointwise adaptive kernel density estimator under local approximate differential privacy, achieving near-optimal convergence rates while ensuring data privacy at the individual level.

Contribution

It introduces a privacy-preserving kernel density estimation method with adaptive bandwidth selection, providing theoretical guarantees and optimal convergence rates under local differential privacy.

Findings

01

Optimal convergence rate under privacy: n^{-(2s-1)/(2s+1)}

02

Adaptive estimator attains near-optimal rate with logarithmic factors

03

Method compatible with multiple statistical procedures in privacy-preserving data analysis

Abstract

We consider non-parametric density estimation in the framework of local approximate differential privacy. In contrast to centralized privacy scenarios with a trusted curator, in the local setup anonymization must be guaranteed already on the individual data owners' side and therefore must precede any data mining tasks. Thus, the published anonymized data should be compatible with as many statistical procedures as possible. We suggest adding Laplace noise and Gaussian processes (both appropriately scaled) to kernel density estimators to obtain approximate differential private versions of the latter ones. We obtain minimax type results over Sobolev classes indexed by a smoothness parameter $s > 1/2$ for the mean squared error at a fixed point. In particular, we show that taking the average of private kernel density estimators from $n$ different data owners attains the optimal rate of…

Equations317

D_{φ} (P ∣ ∣ Q) = \int φ (\frac{d P}{d Q}) d Q = \int φ (\frac{p}{q}) q d μ

D_{φ} (P ∣ ∣ Q) = \int φ (\frac{d P}{d Q}) d Q = \int φ (\frac{p}{q}) q d μ

x, x^{'} \in X sup D_{φ} (Q (\cdot ∣ X = x) ∣ ∣ Q (\cdot ∣ X = x^{'})) ⩽ α .

x, x^{'} \in X sup D_{φ} (Q (\cdot ∣ X = x) ∣ ∣ Q (\cdot ∣ X = x^{'})) ⩽ α .

Q (A ∣ X = x) ⩽ exp (α) Q (A ∣ X = x^{'}) + β,

Q (A ∣ X = x) ⩽ exp (α) Q (A ∣ X = x^{'}) + β,

Δ (g) = x, x^{'} \in X sup ∣ g (x) - g (x^{'})∣ .

Δ (g) = x, x^{'} \in X sup ∣ g (x) - g (x^{'})∣ .

Z = g (X) + b ξ

Z = g (X) + b ξ

K_{h} (X_{i} - t) : = \frac{1}{h} K (\frac{X _{i} - t}{h})

K_{h} (X_{i} - t) : = \frac{1}{h} K (\frac{X _{i} - t}{h})

x, x^{'} \in X sup ∥ Σ^{- 1/2} (g (x) - g (x^{'})) ∥_{2} ⩽ Δ

x, x^{'} \in X sup ∥ Σ^{- 1/2} (g (x) - g (x^{'})) ∥_{2} ⩽ Δ

Z = g (X) + σ Ξ, Ξ \sim N (0, Σ),

Z = g (X) + σ Ξ, Ξ \sim N (0, Σ),

σ ⩾ \frac{Δ}{α} 2 lo g (1/ (2 β)) + 2 α .

σ ⩾ \frac{Δ}{α} 2 lo g (1/ (2 β)) + 2 α .

x, x^{'} \in R sup \frac{1}{h} K (\frac{x - t}{h}) - \frac{1}{h} K (\frac{x ^{'} - t}{h}) ⩽ \frac{2 ∥ K ∥ _{\infty}}{h},

x, x^{'} \in R sup \frac{1}{h} K (\frac{x - t}{h}) - \frac{1}{h} K (\frac{x ^{'} - t}{h}) ⩽ \frac{2 ∥ K ∥ _{\infty}}{h},

Z_{i} = \frac{1}{h} K (\frac{X _{i} - t}{h}) + \frac{2 ∥ K ∥ _{\infty} 2 lo g ( 1/2 β ) + 2 α}{α h} ξ_{i}, ξ_{i} \sim N (0, 1),

Z_{i} = \frac{1}{h} K (\frac{X _{i} - t}{h}) + \frac{2 ∥ K ∥ _{\infty} 2 lo g ( 1/2 β ) + 2 α}{α h} ξ_{i}, ξ_{i} \sim N (0, 1),

C_{T, B} = {f : X \to R : (f (t_{1}), \dots, f (t_{m})) \in B}

C_{T, B} = {f : X \to R : (f (t_{1}), \dots, f (t_{m})) \in B}

Σ_{t_{1}, \dots, t_{m}} = K (t_{1}, t_{1}) ⋮ K (t_{m}, t_{1}) \dots ⋱ \dots K (t_{1}, t_{m}) ⋮ K (t_{m}, t_{m}) .

Σ_{t_{1}, \dots, t_{m}} = K (t_{1}, t_{1}) ⋮ K (t_{m}, t_{1}) \dots ⋱ \dots K (t_{1}, t_{m}) ⋮ K (t_{m}, t_{m}) .

Z = X + σ Ξ

Z = X + σ Ξ

f, g \in F sup m \in N^{*} sup t_{1}, \dots, t_{m} \in X sup Σ_{t_{1}, \dots, t_{m}}^{- 1/2} f (t_{1}) - g (t_{1}) ⋮ f (t_{m}) - g (t_{m})_{2} ⩽ Δ

f, g \in F sup m \in N^{*} sup t_{1}, \dots, t_{m} \in X sup Σ_{t_{1}, \dots, t_{m}}^{- 1/2} f (t_{1}) - g (t_{1}) ⋮ f (t_{m}) - g (t_{m})_{2} ⩽ Δ

i, j = 1 \sum k a_{i} a_{j} K (x_{i}, x_{j}) ⩾ 0

i, j = 1 \sum k a_{i} a_{j} K (x_{i}, x_{j}) ⩾ 0

H_{0} : = {f : f = i \in I \sum c_{i} K_{x_{i}} for some finite index set I}

H_{0} : = {f : f = i \in I \sum c_{i} K_{x_{i}} for some finite index set I}

⟨ f, g ⟩_{H} = i \in I \sum j \in J \sum c_{i} d_{j} K (x_{i}, y_{j})

⟨ f, g ⟩_{H} = i \in I \sum j \in J \sum c_{i} d_{j} K (x_{i}, y_{j})

K (t_{1}, t_{1}) ⋮ K (t_{m}, t_{1}) \dots ⋱ \dots K (t_{1}, t_{m}) ⋮ K (t_{m}, t_{m})^{- 1/2} f (t_{1}) ⋮ f (t_{m})_{2} ⩽ ∥ f ∥_{H} .

K (t_{1}, t_{1}) ⋮ K (t_{m}, t_{1}) \dots ⋱ \dots K (t_{1}, t_{m}) ⋮ K (t_{m}, t_{m})^{- 1/2} f (t_{1}) ⋮ f (t_{m})_{2} ⩽ ∥ f ∥_{H} .

Z = X + σ Ξ

Z = X + σ Ξ

f, g \in F sup ∥ f - g ∥_{H} ⩽ Δ,

f, g \in F sup ∥ f - g ∥_{H} ⩽ Δ,

f_{i, h} (t) = \frac{1}{h} K (\frac{X _{i} - t}{h}), t \in R,

f_{i, h} (t) = \frac{1}{h} K (\frac{X _{i} - t}{h}), t \in R,

Z_{i, h} (\cdot) = f_{i, h} (\cdot) + \frac{Δ}{α} 2 lo g (1/ (2 β)) + 2 α Ξ

Z_{i, h} (\cdot) = f_{i, h} (\cdot) + \frac{Δ}{α} 2 lo g (1/ (2 β)) + 2 α Ξ

∥ K_{h} (x - \cdot) - K_{h} (x^{'} - \cdot) ∥_{H}^{2} = \frac{1}{2 π h ^{2}} (K_{Gauss} (0) + K_{Gauss} (0) - 2 K_{Gauss} (x - x^{'})) ⩽ \frac{1}{π h ^{2}},

∥ K_{h} (x - \cdot) - K_{h} (x^{'} - \cdot) ∥_{H}^{2} = \frac{1}{2 π h ^{2}} (K_{Gauss} (0) + K_{Gauss} (0) - 2 K_{Gauss} (x - x^{'})) ⩽ \frac{1}{π h ^{2}},

K_{⊏ ⊐} (x, y) \propto 1_{{∣ x - y ∣ ⩽ 1}}

K_{⊏ ⊐} (x, y) \propto 1_{{∣ x - y ∣ ⩽ 1}}

i = 1 \sum 3 j = 1 \sum 3 a_{i} K_{⊏ ⊐} (x_{i}, x_{j}) a_{j} \propto 3 - 4 < 0,

i = 1 \sum 3 j = 1 \sum 3 a_{i} K_{⊏ ⊐} (x_{i}, x_{j}) a_{j} \propto 3 - 4 < 0,

K_{△} (x, y) \propto (1 - ∣ x - y ∣) 1_{{∣ x - y ∣ ⩽ 1}}

K_{△} (x, y) \propto (1 - ∣ x - y ∣) 1_{{∣ x - y ∣ ⩽ 1}}

K (x, y) = \int_{R^{d}} f (t + y) f (t + y) d t

K (x, y) = \int_{R^{d}} f (t + y) f (t + y) d t

(1 - ∣ x - y ∣) 1_{{∣ x - y ∣ ⩽ 1}} = \int_{R} 1_{[0, 1/2]} (t + x) 1_{[0, 1/2]} (t + y) d t .

(1 - ∣ x - y ∣) 1_{{∣ x - y ∣ ⩽ 1}} = \int_{R} 1_{[0, 1/2]} (t + x) 1_{[0, 1/2]} (t + y) d t .

K (x, y) \propto exp (- ∣ x - y ∣^{2} /2)

K (x, y) \propto exp (- ∣ x - y ∣^{2} /2)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Vehicular Ad Hoc Networks (VANETs)

Full text

Pointwise adaptive kernel density estimation under local approximate differential privacy

Martin Kroll

CREST, ENSAE, Institut Polytechnique de Paris

prenom.nom[arobase]ensae.fr

Synopsis.

We consider non-parametric density estimation in the framework of local approximate differential privacy. In contrast to centralized privacy scenarios with a trusted curator, in the local setup anonymization must be guaranteed already on the individual data owners’ side and therefore must precede any data mining tasks. Thus, the published anonymized data should be compatible with as many statistical procedures as possible. We suggest adding Laplace noise and Gaussian processes (both appropriately scaled) to kernel density estimators to obtain approximate differential private versions of the latter ones. We obtain minimax type results over Sobolev classes indexed by a smoothness parameter $s>1/2$ for the mean squared error at a fixed point. In particular, we show that taking the average of private kernel density estimators from $n$ different data owners attains the optimal rate of convergence if the bandwidth parameter is correctly specified. Notably, the optimal convergence rate in terms of the sample size $n$ is $n^{-(2s-1)/(2s+1)}$ under local differential privacy and thus deteriorated to the rate $n^{-(2s-1)/(2s)}$ which holds without privacy restrictions. Since the optimal choice of the bandwidth parameter depends on the smoothness $s$ and is thus not accessible in practice, adaptive methods for bandwidth selection are necessary and must, in the local privacy framework, be performed directly on the anonymized data. We address this problem by means of a variant of Lepski’s method tailored to the privacy setup and obtain general oracle inequalities for private kernel density estimators. In the Sobolev case, the resulting adaptive estimator attains the optimal rate of convergence at least up to extra logarithmic factors.

Key words and phrases:

Kernel density estimation. Approximate local differential privacy. Reproducing kernel Hilbert space. Adaptive estimation. Lepski’s method.

2010 Mathematics Subject Classification:

62G05 (primary), and 68P25 (secondary)

The author gratefully acknowledges financial support from GENES and by the French National Research Agency (ANR) under the grant Labex Ecodec (ANR-11-LABEX-0047).

1. Introduction

In the modern information era data are routinely collected in all areas of private and public life. Although the availability of massive data sets is essential to answer important scientific and societal questions, the individual data owners (who may be individuals, households, research institutions, companies, …) might refuse to share their, maybe sensitive, raw data with others. Even more, in view of regularly reported data leaks, they may not even want to entrust their data to a central curator who stores the data and publishes anonymized summary statistics. Finding ourselves in such a dilemma, the question whether and, if yes, how data analytics can still be performed is of special importance. For the evaluation of this question, several aspects have to be taken into account.

Firstly, in absence of a trusted curator, privacy of the data has to be achieved already locally at the individual data owners’ level. The $i$ -th data holder takes its datum, say $X_{i}$ , as the input of a privacy mechanism and creates an output $Z_{i}$ that is considered sufficiently anonymized, for instance, in the sense of any of the privacy definitions listed below. For the purpose of the present paper, a privacy mechanism is a Markov kernel $Q_{i}$ between measurable spaces $(\mathfrak{X},\mathscr{X})$ and $(\mathfrak{Z},\mathscr{Z})$ generating $Z_{i}$ given $X_{i}=x$ according to the distribution $Q^{Z_{i}\mid X_{i}=x}$ . This definition of local privacy is in contrast to the framework of centralized or global privacy where the trusted curator can take the whole data set $\{X_{1},\ldots,X_{n}\}$ to create an output $Z$ .111Thus, the local privacy model can be seen as a proper submodel of the global one because the trusted curator can also mimic any conceivable procedure in the local model.

Secondly, for the quantification of privacy, different solutions have been proposed so far (see [BD14], Section 2 for a comprehensive overview of existing privacy definitions):

$\bullet$

In this paper, we will exclusively work in the framework of $\alpha$ -differential privacy and its generalization $(\alpha,\beta)$ -differential privacy as defined in Definition 2.1 below. These two privacy definitions are also referred to as pure and approximate differential privacy, respectively. Originally, these concepts have been suggested for the anonymization of microdata tables in a global privacy setup, more precisely in a framework where queries are answered by a server that has direct access to the sensitive data [Dwo06, Dwo+06, Dwo08]. In the statistics community, working under privacy constraints has been popularized in the past decade, amongst others, through the articles [WZ10, HRW13] (in the global setup) and [DJW18] (in the local privacy setup). Another strict relaxation of pure differential privacy is random differential privacy as introduced in [HRW11].

$\bullet$

An alternative quantification of privacy can be given as follows: Let $\varphi$ be a function from $[0,\infty]$ to $\mathbb{R}\cup\{+\infty\}$ with $\varphi(1)=0$ . Then, the associated $\varphi$ -divergence between two distributions $\mathbf{P},\mathbf{Q}$ is

[TABLE]

where $\mu$ is a measure such that $\mathbf{P},\mathbf{Q}\ll\mu$ and $p,q$ denote the corresponding Radon-Nikodym densities. Then, the mechanism $Q$ is called $\alpha$ - $\varphi$ -divergence private if

[TABLE]

The intersection of these two concepts is non-empty: For instance, taking $\varphi(x)=\lvert x-1\rvert/2$ , the $\varphi$ -divergence $D_{\varphi}(\mathbf{P}|\!|\mathbf{Q})$ is the total variation distance, and the resulting $\alpha$ - $\varphi$ -divergence is equivalent to $(0,\beta)$ -differential privacy.

Thirdly, the published data $Z_{1},\ldots,Z_{n}$ should ideally be multi-purpose in the sense that they can serve as input data for several types of analyses. Thus, when the unmasked data are for instance a sample from an unknown probability distribution, the anonymized data should contain as much information as possible about the whole distribution and not only about certain characteristics. One main motivation for this work is to introduce novel methodology in the framework of density estimation that aims to address also this issue by proposing a local approximate differential private version of kernel density estimators, that is, the whole function $t\mapsto K((X_{i}-t)/h)/h$ for a bandwidth parameter $h>0$ along with a study of their theoretical properties. Figure 1 gives a foretaste and provides a graphical representation of the general workflow developed in this paper.

Roadmap of the article

Throughout the article, we consider the paradigmatic example of non-parametric density estimation. For the sake of simplicity, we assume that each of $n$ data holders $D_{i}$ observes a size-one sample $X_{i}$ from a (in this paper) univariate target density $f$ , but refuses to share this observation. In Section 2, we first introduce two mechanisms to estrange the value of an kernel density estimator at a single fixed point $t\in\mathbb{R}$ . The first approach is based on adding appropriately scaled Laplace noise. The second approach is based on adding Gaussian noise and can be extended, using the ideas introduced in [HRW13], to an anonymized version of the whole kernel density estimator (as a function from $\mathbb{R}$ to $\mathbb{R}$ ) via perturbation by a suitable Gaussian process. In Section 3, we consider estimation of the unknown density function under approximate differential privacy from a minimax point of view. As the performance measure to evaluate arbitrary estimators, we consider the mean squared error at a fixed point. Both the Laplace and the Gaussian perturbation approach attain the optimal rate $n^{-(2s-1)/(2s+1)}$ in terms of $n$ over Sobolev ellipsoids with smoothness index $s$ which is slower than the rate $n^{-(2s-1)/(2s)}$ in the setup without privacy constraints. The Gaussian process approach, however, makes it possible to estimate the value of the density at any point of the observation window and not only at one single point that has to be chosen prior to the anonymization procedure. In addition, this approach enables the statistician to perform any kind of analysis that plugs kernel density estimators into others estimators. Investigating theoretical guarantees of such plug-in procedures, however, is outside the scope of this work and deferred to future research.

As usual for kernel density estimators, the choice of the bandwidth parameter is crucial. In the considered minimax framework over Sobolev classes, the optimal order of the bandwidth that leads to a rate optimal estimator depends on the smoothness index $s$ which is typically unknown. In Section 4, we apply a Lepski scheme tailored to the privacy framework to overcome this problem and obtain an adaptive choice of the bandwidth. This issue specifically arises in the local privacy setup since in the global framework the trusted curator can apply the existing plethora of methods for bandwidth selection on the unmasked data, and then only publish the resulting estimator with the adaptively determined bandwidth in its anonymized form. In order to perform the Lepski scheme, any data owner has to publish the kernel density estimator not only for one single bandwidth but for a finite set of potential bandwidths. Such a multiple output still guarantees the desired privacy condition provided that the additive noise is multiplied with a factor proportional to the number of potential bandwidths which is logarithmic in the number of data sources in our case. We derive general oracle type inequalities for the estimator resulting from the Lepski procedure adapted to the privacy framework. For the specific example of Sobolev ellipsoids, the rates of convergence are merely deteriorated by logarithmic factors with respect to the case of a priori known smoothness.

2. Privacy mechanisms

2.1. Definition of approximate differential privacy

Let $(\mathfrak{X},\mathscr{X})$ and $(\mathfrak{Z},\mathscr{Z})$ be measurable spaces. A privacy mechanism is a Markov kernel $Q:\mathfrak{X}\times\mathscr{Z}\to[0,1]$ with the interpretation that, given original data $X=x$ , an anonymized output is randomly drawn from the probability measure $Q(\cdot|X=x)$ . In the non-interactive setup that we are going to consider, we work under the following definition of approximate or $(\alpha,\beta)$ -differential privacy.

Definition 2.1.

Let $\alpha\geqslant 0,\beta\in[0,1]$ . We say that $Z\sim Q(\cdot\mid X)$ is a local $(\alpha,\beta)$ -differentially private view of $X$ if for all $x,x^{\prime}\in\mathfrak{X}$ , $A\in\mathscr{Z}$ the estimate

[TABLE]

holds true.

Let us emphasize that in Definition 2.1 the spaces $(\mathfrak{X},\mathscr{X})$ and $(\mathfrak{Z},\mathscr{Z})$ do not need to coincide. In fact, in Example 2.9 the space $(\mathfrak{X},\mathscr{X})$ will be the real line equipped with its Borel sets and $(\mathfrak{Z},\mathscr{Z})$ a measurable space of random functions. In the literature, the case $\beta=0$ is also referred to as $\alpha$ -differential privacy or pure differential privacy. Evidently, the privacy condition (1) becomes more restrictive for smaller values of the two parameter $\alpha$ and $\beta$ . Although Definition 2.1 smoothly bridges the cases $\beta=0$ and $\beta>0$ , the classical anonymization techniques used for $\beta=0$ and $\beta>0$ are essentially different: In the case $\beta=0$ , Laplace perturbation as well as randomization techniques as considered in [DJW18, RS18] can be used. In the case $\beta>0$ , adding appropriately scaled Gaussian noise has been suggested in [HRW13]. However, as proved in [HLM15], appropriately scaled Laplace noise can also lead to approximately differential private outputs (see Proposition 2.2 below). In the sequel, we discuss how to achieve approximate differential privacy by means of these classical subroutines and how they can be extended to deal with functional data as well.

2.2. Univariate output using Laplace noise

First, we consider the case that both the input and the output of the privacy mechanism are univariate and real-valued, that is $(\mathfrak{X},\mathscr{X})=(\mathfrak{Z},\mathscr{Z})=(\mathbb{R},\mathscr{B}(\mathbb{R}))$ . For this case, we consider Laplace perturbation which is also used to derive an upper bound in Section 3. More precisely, let $Y_{i}=g(X_{i})\in\mathbb{R}$ a quantity derived from the $X_{i}$ that should be masked. Define the sensitivity of $g$ as

[TABLE]

Recall that the univariate Laplace distribution, denoted by $\mathcal{L}(b)$ , is given by the probability density function $p_{b}(x)=\frac{1}{2b}\exp(-\lvert x\rvert/b)$ (we include also the case $b=0$ ; then the Laplace distribution is, by convention, the Dirac measure concentrated at [math]). In particular, the variance of an $\mathcal{L}(b)$ distributed random variable is $2b^{2}$ . The following proposition establishes approximate differential privacy by Laplace perturbation.

Proposition 2.2 (See [HLM15], Example 5).

Let $\alpha>0$ , $\beta\in[0,1]$ . Then

[TABLE]

with $\xi\sim\mathcal{L}(1)$ for $b\geqslant\Delta(g)/(\alpha-\log(1-\beta))$ provides an $(\alpha,\beta)$ -differential private view of $g(X)$ (and of $X$ as well).

A benefit of Proposition 2.2 in contrast to the often proposed perturbation by Gaussian noise to establish approximate differential privacy is that it allows to deal with the cases $\beta=0$ and $\beta>0$ by the same approach. Moreover, letting the parameter $\beta$ vary permits natural interpretations: If $\beta=0$ , the variance of $\sqrt{2}b\xi$ corresponds to the one that is usually encountered in the case of pure differential privacy. When $\beta$ tends to one, the privacy constraint gets weaker and the variance of the centred noise $\sqrt{2}b\xi$ tends to [math]. In the extreme case $\beta=1$ it is even allowed to publish $g(X)$ directly.

We now introduce kernel density estimators as the guiding example that we have in mind for the function $g$ for the rest of the paper.

Example 2.3.

Let $X_{1},\ldots,X_{n}$ i.i.d. according to an unknown probability density function $f\colon\mathbb{R}\to\mathbb{R}$ . Let $t\in\mathbb{R}$ be fixed. Then the $i$ -th dataholder, who observes $X_{i}\in\mathbb{R}$ , can compute

[TABLE]

for a bounded kernel function $K$ , that is, $K\colon\mathbb{R}\to\mathbb{R}$ is integrable and $\int K(u)\mathrm{d}u=1$ . The quantity $K_{h}(X_{i}-t)$ will play the role of $g(X)$ in Proposition 2.2. By the triangle inequality $\Delta(K_{h}(\cdot-t))\leqslant 2\lVert K\rVert_{\infty}/h$ , and one can take any $b\geqslant 2\lVert K\rVert_{\infty}/(h(\alpha-\log(1-\beta)))$ to obtain an approximate differential private view of $K_{h}(X_{i}-t)$ . Note that $t\in\mathbb{R}$ has been fixed in advance before the anonymization procedure.

2.3. Multivariate output

In principle, also multivariate output could be dealt with by adding independent Laplace noise to any of the components of the vector to be published. In this case, both $\alpha$ and $\beta$ for each component have to be appropriately scaled in order to obtain the desired level of approximate differential privacy for the whole vector (the scaling can be carried out, for instance, as described in Lemma 2.16 below). This approach, however, results in an increase concerning the Laplace noise added at any single point where the kernel density estimator is evaluated, and thus might deteriorate the performance of subsequent analyses more than necessary. We do not further pursue this course here, since we will introduce a method for the anonymization of functional data that does not inflate the noise at single points in the next subsection. Having stated this general method, we can, for instance, anonymize the whole function $\cdot\mapsto K_{h}(X_{i}-\cdot)$ in Example 2.3, and as a by-product we obtain $(\alpha,\beta)$ -differential privacy for all pointwise evaluations $K_{h}(X_{i}-t)$ , $t\in\mathbb{R}$ without any extra cost on the noise to be added. To achieve anonymization of functional data, adding Gaussian processes with appropriately chosen covariance structure turns out to be convenient. This idea has been originally suggested in [HRW13], but we state the essential steps here again for a clear exposition, and refer to [HRW13] only for the proofs. The first stopover on our way along the results from [HRW13] is the following proposition that provides a condition under which approximate differential privacy of a vector is obtained by adding multivariate Gaussian noise with not necessarily uncorrelated components.

Proposition 2.4.

Let $\alpha>0$ , $\beta\in(0,1/2)$ . Let further $\Sigma\in\mathbb{R}^{m\times m}$ be a positive definite matrix and $g:\mathfrak{X}\to\mathbb{R}^{m}$ for some $m\in\mathbb{N}^{\ast}$ . Assume that

[TABLE]

for all $x,x^{\prime}\in\mathfrak{X}$ . Then, $Z$ defined via

[TABLE]

is $(\alpha,\beta)$ -differential private provided that

[TABLE]

Proposition 2.4 will unfold its full potential in the next subsection where the condition (2) will be reformulated appropriately. For the univariate case (taking $m=1$ ), Proposition 2.4 directly provides a result similar to the one in Example 2.3, again with $t\in\mathbb{R}$ fixed before anonymization.

Example 2.5.

We consider $K_{h}(X_{i}-t)$ as in Example 2.3 and apply Proposition 2.4 for $m=1$ and $\Sigma=\begin{pmatrix}1\end{pmatrix}$ . As in Example 2.3,

[TABLE]

and one can take $\Delta(K_{h}(\cdot-t))=2\lVert K\rVert_{\infty}/h$ in (2). Then, Proposition 2.4 guarantees that the $Z_{i}$ , $i=1,\ldots,n$ defined through

[TABLE]

is an $(\alpha,\beta)$ -differential private view for $\alpha,\beta>1/2$ of $g_{t}(X_{i})$ (and of $X_{i}$ as well).

2.4. From multivariate to functional output

The anonymization techniques used in Examples 2.3 and 2.5 both suffer from the drawback that the output $Z_{i}$ provides information on the kernel density estimator $K_{h}(X_{i}-t)$ for one single $t$ only. The aim of this section, based on Proposition 2.4 and ideas introduced in [HRW13] in the context of global privacy, is to construct a privatized version of the whole function $t\mapsto K_{h}(X_{i}-t)$ by adding a suitable Gaussian process to the kernel density estimator. As a consequence, the kernel density estimator anonymized in this vein can be evaluated at any single $t\in\mathbb{R}$ .

For univariate and multivariate real-valued outputs of privacy mechanisms, the role of the $\sigma$ -field $\mathscr{Z}$ in Definition 2.1 is canonically taken by the Borel sets on $\mathbb{R}$ or $\mathbb{R}^{m}$ . In the case of functional output $Z\colon\mathfrak{X}\to\mathbb{R}^{m}$ (where $\mathfrak{X}$ is an arbritary set), its role is taken by the $\sigma$ -field $\mathscr{C}$ which is generated by the cylinder sets

[TABLE]

where $\mathfrak{T}$ ranges over all finite sets $\mathfrak{T}=\{t_{1},\ldots,t_{m}\}\subseteq\mathfrak{X}$ and $B\in\mathscr{B}(\mathbb{R}^{m})$ . The following result is a reformulation of Proposition 7 in [HRW13] and we omit its proof. See also Example 4 in [HLM15] for an alternative reasoning.

Proposition 2.6.

Let $\Xi:\mathfrak{X}\to\mathbb{R}$ be a sample path of a centred Gaussian process with covariance kernel $K:\mathfrak{X}\times\mathfrak{X}\to\mathbb{R}$ . For $t_{1},\ldots,t_{m}\in\mathfrak{X}$ , consider the Gram matrix

[TABLE]

Let $X\colon\mathfrak{X}\to\mathbb{R}$ be a (random) function in a function class $\mathfrak{F}$ . Then, the release of

[TABLE]

with $\sigma$ fulfilling (3) is $(\alpha,\beta)$ -differential private (with respect to $\mathscr{C}$ ) provided that

[TABLE]

where $\Delta$ is defined in (2).

The main question arising from Proposition 2.6 is how the, on a first sight unhandy condition (4), might be verified. The solution consists in transferring the problem into a reproducing kernel Hilbert space (RKHS) setup. In fact, Proposition 2.6 can be applied effectively when the random functions to be masked belong to the RKHS which corresponds to the covariance kernel of the Gaussian process $\Xi$ .

In order to formulate this next result from [HRW13], we need to introduce some basic notation concerning the considered RKHS (we refer the reader to [BT04] for a comprehensive introduction to RKHS theory). Let $K\colon\mathfrak{X}\times\mathfrak{X}\to\mathbb{R}$ be a positive definite kernel. Recall that a real-valued kernel $K\colon\mathfrak{X}\times\mathfrak{X}\to\mathbb{R}$ is positive definite if

[TABLE]

holds for any $k\in\mathbb{N}^{\ast}$ , $\{a_{1},\ldots,a_{k}\}\subseteq\mathbb{R}$ , and $\{x_{1},\ldots,x_{k}\}\subseteq\mathfrak{X}$ . For any $x\in\mathfrak{X}$ , define the function $K_{x}:\mathfrak{X}\to\mathbb{R}$ by $K_{x}(\cdot)=K(x,\cdot)$ . Then the set

[TABLE]

is a pre-Hilbert space with respect to the norm $\lVert\cdot\rVert_{\mathfrak{H}}$ induced by the scalar product

[TABLE]

for $f=\sum_{i\in I}c_{i}K_{x_{i}}$ , $g=\sum_{j\in J}d_{j}K_{y_{j}}$ . The RKHS corresponding to the kernel $K$ is the Hilbert space $\mathfrak{H}$ resulting from the completion of $\mathfrak{H}_{0}$ with respect to the RKHS norm $\lVert\cdot\rVert_{\mathfrak{H}}$ . The following two results are again taken from [HRW13].

Proposition 2.7 (See [HRW13], Proposition 8).

For $f\in\mathfrak{H}$ , where $\mathfrak{H}$ is the RKHS corresponding to the kernel $K:\mathfrak{X}\times\mathfrak{X}\to\mathbb{R}$ , and for any finite sequence $t_{1},\ldots,t_{m}$ of distinct points from $\mathfrak{X}$ , we have

[TABLE]

Corollary 2.8 (See [HRW13], Corollary 9).

For $X\in\mathfrak{F}\subseteq\mathfrak{H}$ , the release of

[TABLE]

with $\sigma$ as in (3) is $(\alpha,\beta)$ -differential private with respect to $\mathscr{C}$ provided that

[TABLE]

and $\Xi$ is the sample path of centred Gaussian process with covariance kernel $K$ (given by the reproducing kernel of $\mathfrak{H}$ ).

We now apply Corollary 2.8 to kernel density estimators.

Example 2.9.

In the case of univariate density estimation the $i$ -th data holder observes $X_{i}$ drawn from the target density $f$ , and we want him to be able to publish a approximately differential private version of the kernel density estimator

[TABLE]

based on his single observation $X_{i}$ only. In order to apply the above theory we have to assume that the kernel $K(x,y)=K(x-y)$ 222We slightly abuse notation by denoting both the kernel of the kernel density estimator and the corresponding kernel $\mathbb{R}\times\mathbb{R}\to\mathbb{R}$ given through $(x,y)\mapsto K(x-y)$ by the letter $K$ . is also a positive definite kernel. Under this additional assumption, Corollary 2.8 shows that the perturbed kernel density estimator

[TABLE]

where $\Xi$ a Gaussian process with kernel $hK_{h}(x,y)=K((x-y)/h)$ ensures $(\alpha,\beta)$ -differential privacy provided that (6) is satisfied. For instance, for the Gaussian kernel $K_{\text{Gauss}}(\cdot)=\exp(-(\cdot)^{2}/2h^{2})$ we have

[TABLE]

and we can take $\Delta=1/(\sqrt{\pi}h)$ (the same argument working for any non-negative bounded kernel, and with a slight modification for any bounded kernel).

Let us emphasize that the property of positive definiteness is not satisfied for all kernels commonly used for kernel density estimators in non-parametric statistics. In the following, we discuss some popular examples.

Example 2.10.

The rectangular kernel given by

[TABLE]

for $x,y\in\mathbb{R}$ is not positive definite. In order to see this, set $x_{1}=0$ , $x_{2}=\frac{3}{4}$ , $x_{3}=\frac{3}{2}$ , $a_{1}=a_{3}=1$ , and $a_{2}=-1$ . Then

[TABLE]

which contradicts the defining property (5) of positive definite kernels.

Example 2.11.

The triangular kernel given by

[TABLE]

for $x,y\in\mathbb{R}$ is positive definite. This follows from the fact that kernels of the form

[TABLE]

for $x,y\in\mathbb{R}^{d}$ with square integrable $f\colon\mathbb{R}^{d}\to\mathbb{R}$ are positive definite and

[TABLE]

Example 2.12.

The Gaussian kernel

[TABLE]

and the exponential kernel

[TABLE]

are positive definite. These kernels of the form $K(x,y)\propto\exp(-|x-y|^{\gamma})$ are positive definite if and only if $\gamma\in[0,2]$ . This follows by combination of Theorem 2.2 and Exercise 2.13, (b) in [BCR84].

Example 2.13.

The $\operatorname{sinc}$ kernel given by

[TABLE]

is positive semidefinite since the $\operatorname{si}$ -function is the characteristic function of the uniform distribution on the interval $[-1,1]$ . The $\operatorname{sinc}$ -kernel attains also negative values but grant to the estimate $1\geqslant\operatorname{sinc}(\cdot)\geqslant-0.3$ we have, in analogy to the calculation in Example 2.9,

[TABLE]

which yields a suitable bound for $\Delta$ in this example.

Example 2.14.

The Epanechnikov kernel

[TABLE]

is not positive definite. In order to see this, put $x_{1}=0$ , $x_{2}=1/2$ , $x_{3}=1$ , $a_{1}=a_{3}=-0.9$ and $a_{2}=1$ . Then,

[TABLE]

in contradiction to the defining property (5) of positive definite kernels.

Example 2.15.

The biweight kernel

[TABLE]

is not positive definite. To see this, put $x_{1}=1/4$ , $x_{2}=-1/4$ , $x_{3}=-3/4$ , and $x_{4}=1/2$ . Then, consider the matrix $M=(K(x_{i},x_{j}))_{i,j\in\llbracket 1,4\rrbracket}$ . We have

[TABLE]

and the matrix $\widetilde{M}$ is not positive definite, since for $v=\begin{pmatrix}0.7&-0.4&0.2&-0.5\end{pmatrix}^{\top}$

[TABLE]

2.5. A composition lemma for approximate differential privacy

For kernel density estimation, bandwidth selection is usually a delicate issue and so it is in our local privacy setup. Whereas in the centralized setup existing methods can be applied by the trusted curator on the unmasked data, this is not possible in our local setup. Thus the data holders have to publish versions of the kernel density estimator for different bandwidths, and one has to adapt general strategies from the non-private framework to the one with local approximately differential private data. To do this under our privacy constraint it is necessary to understand how multiple outputs influence the defining condition of approximate differential privacy. The following lemma provides a result of this flavour and is known in the research literature on privacy for statistical databases. The setup is the following: Given the unmasked datum $X$ , the data owner does not only want to publish $Z_{1}=Z_{1}(X)$ but also $Z_{2}=Z_{2}(X)$ , i.e., the vector $(Z_{1},Z_{2})$ . The following result tells us how $\alpha$ and $\beta$ for the single components have to be scaled in order to obtain $(\alpha,\beta)$ -differential privacy for multiple outputs.

Lemma 2.16 (Composition lemma for $(\alpha,\beta)$ -differential privacy).

Let $Z_{i}$ , $i=1,2$ be $(\alpha_{i},\beta_{i})$ -differential private and conditionally (on $X$ ) independent views of $X$ , respectively. Then $Z=(Z_{1},Z_{2})$ is an $(\alpha_{1}+\alpha_{2},\beta_{1}+\beta_{2})$ -differential private view of $X$ .

Of course, Lemma 2.16 can be successively applied. For instance, if we want to publish $Z_{i,h}$ from the above examples for different $h$ in a finite set $\mathcal{H}$ , then $\alpha$ and $\beta$ should be replaced with $\alpha^{\prime}=\alpha/\#\mathcal{H}$ and $\beta^{\prime}=\beta/\#\mathcal{H}$ , respectively, in order to get differential privacy for $Z=(Z_{i,h})_{h\in\mathcal{H}}$ .

3. Private minimax estimation

Minimax theory provides a standard framework to study convergence properties of estimators in non-parametric statistics [Tsy09]. In this section, we apply this general toolbox to the specific case of density estimation under privacy constraints. For fixed $t\in\mathbb{R}$ and any estimator $\widehat{\ell}$ of the linear functional $f(t)$ based on the private views $Z=\{Z_{1},\ldots,Z_{n}\}$ , we study its mean squared error

[TABLE]

The guiding principle of minimax theory is to look for estimators that perform best in a worst-case scenario. However, due to the privacy framework, we have not only the freedom of choosing the estimator $\widehat{\ell}$ but also the privacy mechanism $Q$ that generates the private outputs. Hence, following [DJW18], classical minimax theory has to be adapted and a natural quantity to consider is the private minimax risk

[TABLE]

where $\mathcal{P}$ is some function class containing probability densities and the infimum is taken over all local $(\alpha,\beta)$ -differential private Markov kernels $Q\in\mathcal{Q}_{\alpha,\beta}$ and all estimators based on the local approximate differential private views $Z$ of the corresponding original sample $X_{1},\ldots,X_{n}$ . We specify the function class $\mathcal{P}$ by so called Sobolev ellipsoids $\mathcal{S}(s,L)$ that we define for $s>1/2$ and $L>0$ by means of

[TABLE]

which, for $s\in\mathbb{N}^{\ast}$ , is equivalent to the definition

[TABLE]

In the first definition, $\mathcal{F}[f]$ denotes the Fourier transform of the density $f$ , in the second one $f^{(s)}$ denotes the weak $s$ -th derivative of $f$ .

3.1. Upper bound

We first derive an upper bound on the minimax risk by specializing both the privacy mechanism $Q\in\mathcal{Q}_{\alpha,\beta}$ and the estimator of $f(t)$ . Concerning the privacy mechanism, we consider the mechanisms mapping $X_{i}$ to private views $Z_{i,h}$ of $K_{h}(X_{i}-t)$ from Section 2 for one single $h>0$ . More precisely, we consider the Laplace mechanism given through

[TABLE]

and the Gaussian process mechanism given through

[TABLE]

where $\Xi_{i,h}$ are i.i.d. Gaussian processes with covariance kernel $K((x-y)/h)$ and $\Delta^{\prime}$ is an upper bound on $\lVert(hK_{h})_{x}-(hK_{h})_{x^{\prime}}\rVert_{\mathfrak{H}}$ for $x,x^{\prime}\in\mathbb{R}$ . Given $Z_{1,h},\ldots,Z_{n,h}$ as in (7) or (8), a natural estimator of $f(t)$ is given by

[TABLE]

The following proposition provides an upper risk bound for this estimator specialized with the $\operatorname{sinc}$ -kernel over the Sobolev ellipsoids $\mathcal{S}(s,L)$ introduced above.

Proposition 3.1.

Consider the kernel density estimator $\widehat{f}_{h}(t)$ for some fixed $t\in\mathbb{R}$ where the kernel used in the anonymization procedure (7) or (8) is the $\operatorname{sinc}$ -kernel from Example 2.13. Then, for any $s>1/2$ ,

[TABLE]

for some $C=C(\alpha,\beta,L,s,\lVert f\rVert_{\infty},K_{\operatorname{sinc}})$ . In particular, setting $h=h^{\star}$ with $h^{\star}\asymp n^{-1/(2s+1)}$ , we obtain

[TABLE]

Since the noise added by the privacy mechanisms is centred, the bias term in the proof of Proposition 3.1 remains unchanged in comparison to the standard setup without privacy constraints. However, the variance term changes due to the additional Laplace or Gaussian noise, respectively, and the classical variance term $1/(nh)$ is joined by the additional term $1/(nh^{2})$ which is of higher order for $h\to 0$ . Consequently, the optimal bandwidth is no longer of order $n^{-1/(2s)}$ as in the standard setup but of the larger order $n^{-1/(2s+1)}$ . However, consistency of $\widehat{f}_{h}$ is already guaranteed if $h\to 0$ and $nh^{2}\to\infty$ simultaneously (in the standard density estimation setup one only needs $nh\to\infty$ in addition to $h\to 0$ ).

3.2. Lower bound

The following result states a lower bound over Sobolev ellipsoids in the case of pure differential privacy ( $\beta=0$ ).

Proposition 3.2.

Let $\alpha>0$ arbitrary. Then,

[TABLE]

where $C(\alpha)>0$ depends on the privacy parameter, and the infimum is taken over all estimators $\widehat{\ell}$ based on private views $Z_{1},\ldots,Z_{n}$ and privacy mechanisms providing $(\alpha,0)$ -differential privacy.

*Remark 3.3**.*

The lower bound of Proposition 3.2 still holds true when one allows a slight amount of interaction between the data holders, namely when the distribution of every $Z_{i}$ is determined by $X_{i}$ and the previously masked values $Z_{1},\ldots,Z_{i-1}$ . The proof remains the same because the data processing inequality (14) from [DJW18] still holds true in this more general setup.

Proposition 3.2 shows that, regarding the privacy parameter $\alpha$ as an a priori fixed constant, the estimators $\widehat{f}_{h}(t)$ from Proposition 3.1 attain the optimal rate $n^{-(2s-1)/(2s+1)}$ in terms of $n$ under pure local differential privacy. Recall that without privacy restrictions the optimal rate over Sobolev ellipsoids is $n^{-(2s-1)/(2s)}$ (as mentioned in [But01], this rate can, other than by a reduction scheme as used in our proof, be easily obtained via the theory developed in [DL92], see also [Tsy98]). In this work, we consider the parameters $\alpha$ , $\beta$ as fixed and are interested in the behaviour of the rate as a function of $n$ only but remarks concerning $\alpha$ analogous to the ones made in [But+19] could be made (as in that paper, $\alpha$ and $\beta$ could also be allowed to vary with $n$ ). The optimal behaviour, however, of the rates in terms of the privacy parameters $\alpha$ and $\beta$ , especially if $\beta>0$ , remains an open issue.

4. Adaptation to unknown smoothness

The estimators of the previous section are not completely satisfying since the optimal choice $h^{\star}_{n}$ of the bandwidth, as usually in non-parametric statistics, depends on a priori knowledge of the smoothness of the unknown function $f$ . Such knowledge is usually not available in practise. At least, using the Gaussian process perturbation approach we relieved ourselves from the drawback of the Laplace method that one can privatize only one functional of the form $f(t)$ for one single $t$ that has to be fixed even before the anonymization. Note that this drawback is, for instance, also present in the mechanisms suggested in [RS18]. From this point of view, anonymization of the whole kernel density estimator via this approach should be preferred.

The purpose of this section is to address the remaining issue of adapting to the unknown smoothness of $f$ . In order to tackle this problem, we use a variant of Lepski’s method (see [LS97] for a general account in the Gaussian white noise model, and [Cav01] for an application to a tomography problem whose concise presentation has inspired our one). Recall again that the necessity of novel methodology for adaptive estimation is specific for the setup of local privacy since in the global case the trusted curator can choose the bandwidth in an adaptive way using all the data $X_{1},\ldots,X_{n}$ and, as a consequence, can build on the existing plethora of methods and theoretical results for this standard case; hence bandwidth selection does not provide any additional difficulty for centralized privacy since only the final output is anonymized. In our local setup, where the data owners publish their data prior to any data analysis, adaptation must be addressed separately. Note that the problem of adaptation has, to the best of the author’s knowledge, only been addressed in the recent work [But+19] so far, where the authors use wavelet estimators for density estimation on a compact interval. The approach in that paper is thus conceptionally different from the one presented in the sequel.

We will apply Lepski’s method both on observations (7) where $t\in\mathbb{R}$ has been fixed a priori and on pathwise observations (8) from the Gaussian process approach that we evaluate at the point $t\in\mathbb{R}$ of interest. In order to apply Lepski’s method, the observations (7) and (8) must be available for different values of the bandwidth parameter $h$ , say $h\in\mathcal{H}_{n}$ . This can be realized using Lemma 2.16 provided that the privacy parameters $\alpha$ and $\beta$ are appropriately scaled. Thus, we can assume that $Z_{i,h}(t)$ are accessible for any $i\in\llbracket 1,n\rrbracket$ and $h\in\mathcal{H}_{n}$ if we replace $\alpha$ and $\beta$ by $\alpha^{\prime}=\alpha/\#\mathcal{H}_{n}$ and $\beta^{\prime}=\beta/\#\mathcal{H}_{n}$ , respectively. For any $h\in\mathcal{H}_{n}$ and $t\in\mathbb{R}$ , we can then consider the estimator defined in (9). In our case, we define the set of potential bandwidths by a geometrid grid,

[TABLE]

where $a>1$ is a fixed constant, $\overline{h}_{n}$ is such that $a\log(\overline{h}_{n}\sqrt{n})/\sqrt{n}\leqslant\overline{h}_{n}\leqslant 1$ , and $\underline{h}_{n}$ satisfies $\underline{h}_{n}=(\log(\overline{h}_{n}\sqrt{n})\vee 1)/\sqrt{n}$ . For $h\in\mathcal{H}_{n}$ and some $M>0$ , define333In the sequel, we write $C_{\alpha^{\prime}\beta^{\prime}}$ for both $C_{\alpha^{\prime}\beta^{\prime}}^{\mathcal{L}}$ and $C_{\alpha^{\prime}\beta^{\prime}}^{\mathrm{GP}}$ .

[TABLE]

where $C_{\alpha^{\prime}\beta^{\prime}}$ is defined as in Section 3. The proof of 3.1 shows that

[TABLE]

if $\lVert f\rVert_{\infty}\leqslant M$ . Put $\lambda(h)=\max(1,(\kappa\log(\overline{h}_{n}/h))^{1/2})$ with $\kappa$ being a sufficiently large constant (an explicit value can be determined from the proof of Theorem 4.3) and define

[TABLE]

If the set in the definition of $h^{\ast}_{n}$ is empty, we set $h^{\ast}_{n}=\underline{h}_{n}$ by convention. However, in the proof of Proposition 4.1 we will show that this set is non-empty for $n$ large enough. The bandwidth $h_{n}^{\ast}$ is an oracle in the sense that it is not accessible by the statistician since it depends on the unknown parameter $f$ . The definition of $h^{\ast}_{n}$ provides some kind of ideal criterion: The bandwidth $h$ is increased along the grid $\mathcal{H}_{n}$ as long as the bias term $\lvert f_{\eta}(t)-f(t)\rvert$ it is bounded by the ’rate’ $v(h)\lambda(h)$ , a procedure that aims at mimicking the classical bias-variance tradeoff. In order to state a risk bound for the pseudo estimator $\widehat{f}_{h^{\ast}_{n}}$ , we further define

[TABLE]

Proposition 4.1.

Consider the pseudo-estimator $\widehat{f}_{h^{\ast}_{n}}$ defined via (9) and (10) where $\alpha$ and $\beta$ are replaced with $\alpha^{\prime}$ and $\beta^{\prime}$ , respectively. Assume that

[TABLE]

Consider $\overline{h}_{n}=1$ . Then, for $n$ sufficiently large,

[TABLE]

uniformly for all $f$ with $\lVert f\rVert_{\infty}\leqslant M$ .

*Remark 4.2**.*

Assumption (11) is satisfied in many cases. For instance, if $\int\lvert K(u)\rvert\mathrm{d}u<\infty$ , then (11) is a special case of Bochner’s lemma (see [Tsy04], Lemma 1.1). However, the $\operatorname{sinc}$ -kernel is not absolutely integrable and thus Bochner’s lemma cannot be applied. In this case, one can alternatively assume that $f$ belongs at least to some Sobolev space $\mathcal{S}(s,L)$ for some $s>1/2$ . Then, the analysis of the bias term as in the proof of Proposition 3.1 guarantees the validity of (11).

The pseudo estimator $\widehat{f}_{h^{\ast}_{n}}$ is a stopover on our road to an adaptive estimator. We now construct a genuine estimator of $f$ that aims at mimicking this oracle. For this, we first define

[TABLE]

Then, calculations similar to those in the proof of Proposition 3.1 show that

[TABLE]

if $\lVert f\rVert_{\infty}\leqslant M$ . For $h,\eta\in\mathcal{H}_{n}$ , put

[TABLE]

Then, we define an adaptive choice of the bandwidth parameter by

[TABLE]

This choice of the bandwidth is well-defined since the maximum is taken over a non-empty set. The definition of $\widehat{h}_{n}$ is characteristic for Lepski’s method [Lep90], and the motivation of this procedure is neatly described in [Cav01], p. 67: One chooses the largest bandwidth $h$ such that the difference between the two estimators $\widehat{f}_{h}$ and $\widehat{f}_{\eta}$ is not too large (in the sense of (12)) for all $\eta\leqslant h$ . Evidently, the motivation of this procedure is to mimick the trade-off between squared bias and variance in a purely data-driven manner. Note also that (12) provides, as well as the oracle version (10), a local choice of the bandwidth in the sense that $\widehat{h}_{n}$ depends on $t$ . Such a local criterion might result in a better adaptation to spatial inhomogeneity of the target density than global selection rules.

Theorem 4.3.

Consider the estimator $\widehat{f}_{\widehat{h}_{n}}$ defined via (9) and (12) where $Z_{i,h}(t)$ for $h\in\mathcal{H}_{n}$ are defined via (7) or (8) with $\alpha$ and $\beta$ replaced with $\alpha^{\prime}$ and $\beta^{\prime}$ , respectively. Then, uniformly for all $f$ with $\lVert f\rVert_{\infty}\leqslant M$ ,

[TABLE]

As a consequence, taking $\overline{h}_{n}=1$ , we obtain

[TABLE]

*Remark 4.4**.*

By specifying Theorem 4.3 with the $\operatorname{sinc}$ -kernel and $\overline{h}_{n}=1$ , one obtains an adaptive estimator attaining the optimal rate of convergence over functions bounded by $M$ in a Sobolev ellipsoid up to an extra logarithmic factor. A logarithmic loss for adaptation is commonly accepted and even known to be indispensable for pointwise estimation in the non-private framework [BL96].

5. Discussion

We have suggested an approach to adaptive kernel density estimation via Lepski’s method in the framework of local approximate differential privacy. Although we have studied its theoretical properties in the prototypical example of univariate density estimation only, our methodology should also be transferable to the multivariate case. We also conjecture that it might be possible to extend our results to the case of general linear functionals (different from pointwise evaluation of the density function at a fixed point) as investigated in [GP00] via Lepski’s method in a inverse problem setup. Furthermore, our methodology might be applicable to obtain local private estimation procedures in functional data analysis. However, a lot of questions remain open: One drawback of our approach is that the perturbation by a Gaussian process provides only approximate differential privacy and cannot be extended to pure differential privacy. The creation of new methods for kernel estimators that overcome this restriction provides a further direction for future research. Moreover, the optimal power of the logarithmic factor in the adaptive rate of convergence deserves further investigation as well as the behaviour of the minimax optimal rates in terms of the privacy parameters $\alpha$ and $\beta$ .

Appendix A Proofs of Section 2

A.1. Proof of Proposition 2.2

Let $A\in\mathscr{B}(\mathbb{R})$ be arbitrary. It has to be shown that

[TABLE]

for any $x,x^{\prime}\in\mathfrak{X}$ . By the triangle inequality this holds true if

[TABLE]

and the latter holds true as soon as $1\leqslant\exp\left(\alpha-\Delta(g)/b\right)+\beta$ which is equivalent to $b\geqslant\Delta(g)/(\alpha-\log(1-\beta))$ .

A.2. Proof of Proposition 2.4

We have to show that

[TABLE]

for all $A\in\mathscr{B}(\mathbb{R}^{m})$ . This condition is satisfied if the set where the ratio $\mathrm{d}\mathbf{P}^{Z|X=x}/\mathrm{d}\mathbf{P}^{Z|X=x^{\prime}}$ exceeds $\exp(\alpha)$ has probability bounded by $\beta$ under $\mathbf{P}^{Z|X=x}$ . We have

[TABLE]

and the condition $\frac{\mathrm{d}\mathbf{P}^{Z|X=x}(z)}{\mathrm{d}\mathbf{P}^{Z|X=x^{\prime}}(z)}>\exp(\alpha)$ is equivalent to

[TABLE]

which in turn can be reformulated as

[TABLE]

Set $\Omega=\{z\in\mathbb{R}^{m}:\eqref{EQ:DEF:COND:OMEGA}\text{ holds}\}$ and let $\xi$ denote a $\mathcal{N}(0,I_{m})$ distributed random variable where $I_{m}$ denotes the $m\times m$ -dimensional identity matrix. Then

[TABLE]

where $\nu$ is a univariate standard Gaussian random variable. We now use the standard estimate $\mathbf{P}(\nu\geqslant t)\leqslant e^{-t^{2}/2}/2$ whose right-hand side is smaller than $\beta\in(0,1/2)$ if $t^{2}\geqslant-2\log(2\beta)$ . We apply this estimate with $t=\frac{\sigma\alpha}{\Delta}-\frac{\Delta}{2\sigma}$ , and thus $\mathbf{P}^{Z|X=x}(\Omega)\leqslant\beta$ if

[TABLE]

and this holds at least if

[TABLE]

A.3. Proof of Lemma 2.16

Let $A\in\mathscr{Z}_{1}\otimes\mathscr{Z}_{2}$ be a measurable set. Denote $A_{z_{1}}=\{z_{2}\in\mathfrak{Z}_{2}:(z_{1},z_{2})\in A\}$ which is measurable. By Cavalieri’s principle and the independence assumption

[TABLE]

Now put $\Omega=\{\mathrm{d}\mathbf{P}^{Z_{1}|X=x}/\mathrm{d}\mathbf{P}^{Z_{1}|X=x^{\prime}}\leqslant e^{\alpha_{1}}\}\subseteq\mathfrak{Z}_{1}$ . Then $\mathbf{P}^{Z_{1}|X=x}(\Omega^{\mathsf{c}})\leqslant\beta_{1}$ since otherwise there would be a contradiction to approximate differential privacy. Hence,

[TABLE]

which shows the claim assertion.

Appendix B Proofs of Section 3

B.1. Proof of Proposition 3.1

The bias-variance decomposition for the estimator $\widehat{f}_{h}(t)$ is

[TABLE]

where $f_{h}(t)=\mathbf{E}[\widehat{f}_{h}(t)]$ . We begin with the analysis of the bias. First recall that

[TABLE]

and due to centredness of the error added by the privacy mechanism

[TABLE]

Thus, using that $\mathcal{F}\left[K_{\operatorname{sinc}}\right](\cdot)=\mathbf{1}_{[-\pi,\pi]}(\cdot)$ , we obtain

[TABLE]

Let us now consider the variance, where we have to distinguish between the case of Laplace mechanism and Gaussian mechanism. We denote

[TABLE]

For the Laplace mechanism, we have by denoting $\xi\sim\mathcal{L}(1)$ that

[TABLE]

In a similar fashion, for the Gaussian mechanism, now letting $\xi\sim\mathcal{N}(0,1)$ , we have

[TABLE]

The statement of the proposition follows now by combining the obtained bounds for squared bias and variance.

B.2. Proof of Proposition 3.2

Let $\widehat{\ell}$ , $Q\in\mathcal{Q}_{\alpha}$ be arbitrary as in the statement of the proposition. Define $\psi_{n}>0$ via $\psi_{n}^{2}=n^{-\frac{2s-1}{2s+1}}$ . Let $f_{0,n}$ , $f_{1,n}$ be two functions in $\mathcal{S}(s,L)$ (to be specified later on) such that $(f_{0,n}(t)-f_{1,n}(t))^{2}\gtrsim\psi_{n}^{2}$ . Using a general reduction argument (see [Tsy09], Section 2.2) it can be shown that

[TABLE]

where the infimum is taken over all $\{0,1\}$ -valued test functions $\tau$ based on the observations $Z_{1},\ldots,Z_{n}$ and $\mathbf{P}_{\theta}$ denotes the distribution of $Z_{1},\ldots,Z_{n}$ if the true density of $X_{1},\ldots,X_{n}$ is $f_{\theta,n}$ . In view of [Tsy09], Theorem 2.2, Statement (iii), the claim assertion follows if we can choose the functions $f_{0,n}$ and $f_{1,n}$ such that

(1)

$f_{0,n},f_{1,n}\in\mathcal{S}(s,L)$ , 2. (2)

$(f_{0,n}(t)-f_{1,n}(t))^{2}\gtrsim\psi_{n}^{2}$ , and 3. (3)

$\mathrm{KL}(\mathbf{P}_{0},\mathbf{P}_{1})\leqslant C<\infty$ for some $C$ independent of $n$ .

To construct such $f_{0,n},f_{1,n}$ we use ideas from Section 6 of [But01] and refer to this paper also for some of the computations. First, take a strictly positive probability density $f$ on $\mathbb{R}$ that is infinitely often continously differentiable. Setting $\lVert f^{(s)}\rVert_{2}^{2}=\frac{1}{2\pi}\int_{\mathbb{R}}\lvert\mathcal{F}[f](\omega)\rvert^{2}\lvert\omega\rvert^{2s}\mathrm{d}\omega$ , we can further assume that $\lVert f^{(s)}\rVert_{2}\leqslant L$ . Then, for $\delta\in(0,1/2)$ , define the function $f_{0,n}$ by

[TABLE]

In order to define the second hypothesis $f_{1,n}$ we consider the auxiliary function $\widetilde{K}_{s}$ as introduced on p. 26 of [But01] (its construction in that paper is borrowed from [Tsy98]). In particular, note that $\widetilde{K}_{s}$ is compactly supported and satisfies $\lVert\widetilde{K}_{s}^{(s)}\rVert_{2}\leqslant 1-\delta/2$ (thus $\widetilde{K}_{s}\in\mathcal{S}(s,1)$ ) and $\widetilde{K}_{s}(0)\geqslant(1-\delta)C(s)>0$ . Set $h_{n}=(n(\exp(\alpha)-1)^{2})^{-1/(2s+1)}$ , and put

[TABLE]

for some constant $c>0$ . Defining $\gamma_{n,s}=\int g_{n,s}(x)\mathrm{d}x<\infty$ , set

[TABLE]

We now check conditions 1–3 from above.

Verification of 1:

The proof follows step by step along the lines of the one in [But01] and we omit the details. We only record the fact that

[TABLE]

which will be used below.

Verification of 2:

We have

[TABLE]

Now, since $g_{n}(t)=Ch_{n}^{s-\frac{1}{2}}$ and $\gamma_{n}=O(h_{n}^{s+\frac{1}{2}})$ , the last expression inside the outer absolute values is greater than $Ch_{n}^{s-\frac{1}{2}}$ for sufficiently large $n$ , say $n\geqslant n_{0}$ . Hence for $n\geqslant n_{0}$ ,

[TABLE]

which is the desired bound.

Verification of 3:

By Equation (14) in [DJW18] we have

[TABLE]

Now

[TABLE]

for $n$ sufficiently large. Thus, by (14), for $n$ sufficiently large

[TABLE]

Appendix C Proofs of Section 4

C.1. Proof of Proposition 4.1

Under Assumption (11), we have that $\sup_{0<\eta\leqslant h}\lvert f_{\eta}(t)-f(t)\rvert^{2}$ converges to zero as $h\to 0$ . Let $n\geqslant 3$ . By definition of $v^{2}(\cdot)$ , $\lambda(\cdot)$ and $\underline{h}_{n}=\log(\sqrt{n})/\sqrt{n}$ (since $\overline{h}_{n}=1$ ),

[TABLE]

hence $\liminf_{n\to\infty}v(\underline{h}_{n})\lambda(\underline{h}_{n})>0$ , and the set in the definition of $h^{\ast}_{n}$ is non-empty provided that $n$ is sufficiently large. Now, the bias-variance decomposition of the pseudo estimator is

[TABLE]

Let now $h_{0}$ be the minimizer in the definition of $r_{n}(t,f)$ . We distinguish the cases $h_{0}<ah^{\ast}_{n}$ and $h_{0}\geqslant ah^{\ast}_{n}$ . First, if $h_{0}<ah^{\ast}_{n}$ , then

[TABLE]

If $h_{0}\geqslant ah^{\ast}_{n}$ , then by the very definition of $h^{\ast}_{n}$ we obtain

[TABLE]

and thus $r_{n}(t,f)\gtrsim v^{2}(h^{\ast}_{n})\lambda^{2}(h^{\ast}_{n})$ also in this case.

C.2. Proof of Theorem 4.3

We consider the risk decomposition

[TABLE]

and study the two terms on the right-hand side separately.

Analysis of the first term (Case $\widehat{h}_{n}\geqslant h^{\ast}_{n}$ ). Note that the quantities $v(\cdot),\lambda(\cdot)$ satisfy $v(h)\geqslant v(h^{\prime})$ and $\lambda(h)\geqslant\lambda(h^{\prime})$ for $h^{\prime}\geqslant h$ . Thus, using the inequality $(a+b)^{2}\leqslant 2a^{2}+2b^{2}$ , we have for $h\leqslant h^{\prime}$ that

[TABLE]

By the definition of $\psi$ and $\widehat{h}_{n}$ , we obtain

[TABLE]

Hence (recall that we denote $f_{h}(t)=\mathbf{E}[\widehat{f}_{h}(t)]$ ),

[TABLE]

where we used the bound $\operatorname{Var}(\widehat{f}_{h^{\ast}_{n}})\leqslant v^{2}(h^{\ast}_{n})$ for the term $2\mathbf{E}[(\widehat{f}_{h^{\ast}_{n}}(t)-f_{h^{\ast}_{n}}(t))^{2}]$ and the definition of $h^{\ast}_{n}$ to bound the term $(f_{h^{\ast}_{n}}(t)-f(t))^{2}$ .

Analysis of the second term (Case $\widehat{h}_{n}<h^{\ast}_{n}$ ). For $h,\eta\in\mathcal{H}_{n}$ with $\eta<h$ , set

[TABLE]

Let $h$ in $\mathcal{H}_{n}$ . Then, by definition of $\widehat{h}_{n}$ ,

[TABLE]

and thus

[TABLE]

We obtain

[TABLE]

By definition of $h^{\ast}_{n}$ , for all $\eta,h\in\mathcal{H}_{n}$ with $\eta<h\leqslant h^{\ast}_{n}$ , it holds

[TABLE]

Now, for $\eta<h\leqslant h^{\ast}_{n}$ ,

[TABLE]

where $\zeta_{i}=\zeta_{i,h,\eta}=Z_{i,h}(t)-Z_{i,\eta}(t)-(f_{h}(t)-f_{\eta}(t))$ . Note that $\mathbf{E}\zeta_{i}=0$ and $\operatorname{Var}(\zeta_{i})\leqslant nv^{2}(h,\eta)$ . Now, by the Cauchy-Schwarz inequality,

[TABLE]

For the first term in the sum, we have

[TABLE]

Putting $\zeta_{i}^{\prime}=Z_{i,a^{-1}h}(t)-f_{a^{-1}h}(t)$ , we have

[TABLE]

On the one hand,

[TABLE]

on the other hand

[TABLE]

Hence,

[TABLE]

Moreover, for $a^{-1}h<h^{\ast}_{n}$ ,

[TABLE]

by the very definition of $h^{\ast}_{n}$ . Thus, altogether,

[TABLE]

and by the monotonicity of $v(\cdot)$ and $\lambda(\cdot)$ , for $\eta<h\leqslant h^{\ast}_{n}$

[TABLE]

Write $\zeta_{i}=\zeta_{i}^{(1)}+\zeta_{i}^{(2)}$ where $\zeta_{i}^{(1)}=K_{h}(X_{i}-t)-K_{\eta}(X_{i}-t)-(f_{h}(t)-f_{\eta}(t))$ and $\zeta_{i}^{(2)}=\frac{C_{\alpha^{\prime}\beta^{\prime}}}{\sqrt{2}h}\xi_{i,h}+\frac{C_{\alpha^{\prime}\beta^{\prime}}}{\sqrt{2}\eta}\xi_{i,\eta}$ with $\xi_{i,h},\xi_{i,\eta}$ i.i.d. $\sim\mathcal{L}(1)$ or $\zeta_{i}^{(2)}=\frac{C_{\alpha^{\prime}\beta^{\prime}}}{h}\xi_{i,h}+\frac{C_{\alpha^{\prime}\beta^{\prime}}}{\eta}\xi_{i,\eta}$ with $\xi_{i,h},\xi_{i,\eta}$ i.i.d. $\sim\mathcal{N}(0,1)$ for $i=1,\ldots,n$ . We have

[TABLE]

Consider $\mathbf{P}\left(\left\lvert\frac{1}{n}\sum_{i=1}^{n}\zeta_{i}^{(1)}\right\rvert>\frac{v(h,\eta)\lambda(\eta)}{2}\right)$ first. By Bernstein’s inequality (see Lemma D.1) with $b=4\lVert K\rVert_{\infty}/\eta$ ,

[TABLE]

Note that

[TABLE]

For any $h\in\mathcal{H}_{n}$ and $n$ large enough, it holds

[TABLE]

Thus

[TABLE]

For the probability in terms of $\zeta_{i}^{(2)}$ , we consider now the Gaussian case first. Using standard concentration results for the Gaussian distribution, we obtain

[TABLE]

where $t=v(h,\eta)\lambda(\eta)/2$ and $\sigma^{2}$ denotes the variance of the Gaussian random variable $\sum_{i=1}^{n}\zeta_{i}^{(2)}$ . Then,

[TABLE]

Thus,

[TABLE]

Combining (15) and (16), we obtain for the Gaussian case

[TABLE]

and we denote $\kappa^{\prime}=\frac{\kappa}{8}\wedge\frac{\kappa C_{\alpha^{\prime}\beta^{\prime}}}{32\lVert K\rVert_{\infty}}$ .

Let us now consider the probability in terms of $\zeta_{i}^{(2)}$ for the Laplace case which is a little bit more involved since the sum of two Laplace random variables is not Laplace anymore. We decompose

[TABLE]

Consider only the first probability on the right-hand side, the bound for the second one following analogously. By Bernstein’s inequality (see Lemma D.1, take the version with control on the moments applied with $t=v(h,\eta)\lambda(\eta)/4$ , $v^{2}=C_{\alpha^{\prime}\beta^{\prime}}^{2}/h^{2}$ and $b=C_{\alpha^{\prime}\beta^{\prime}}/h$ )

[TABLE]

and hence by using $\sqrt{n}\geqslant\sqrt{\kappa}\log(\overline{h}_{n}/\eta)$ ,

[TABLE]

Finally, we obtain with $\kappa^{\prime}=\frac{\kappa}{64}\wedge\frac{\kappa C_{\alpha^{\prime}\beta^{\prime}}}{32\lVert K\rVert_{\infty}}$ that

[TABLE]

in the Laplace case. Note that

[TABLE]

for both cases with different choices of $\kappa^{\prime}$ . Now,

[TABLE]

For sufficiently small $\gamma>0$ 444Our calculations show that $\gamma>0$ has to satisfy also that $\kappa^{\prime}/2-\gamma-2>0$ . Such a choice is possible whenever $\kappa^{\prime}/2-2>0$ which holds for $\kappa$ large enough., we have

[TABLE]

Recall that $v^{2}(h)\asymp\frac{1}{nh}+\frac{1}{nh^{2}}$ . Thus,

[TABLE]

The sums on the right-hand side converge and the bound for the case $\widehat{h}_{n}<h^{\ast}_{n}$ is negligible with respect to the upper bound $v^{2}(h^{\ast}_{n})\lambda^{2}(h^{\ast}_{n})$ .

Appendix D Bernstein inequality

The following version of the Bernstein inequality is taken from [Com15].

Lemma D.1.

Let $X_{1},\ldots,X_{n}$ be i.i.d. random variables and put $S_{n}=\sum_{i=1}^{n}(X_{i}-\mathbf{E}[X_{i}])$ . Then, for any $t>0$ ,

[TABLE]

where $\operatorname{Var}(X_{1})\leqslant v^{2}$ and $\lvert X_{1}\rvert\leqslant b$ (or $\mathbf{E}[\lvert X_{i}\rvert^{m}]\leqslant\frac{m!}{2}v^{2}b^{m-2}\text{ for }m\geqslant 2$ ).

Bibliography24

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[BCR 84] Christian Berg, Jens Peter Reus Christensen and Paul Ressel “Harmonic analysis on semigroups” Theory of positive definite and related functions 100 , Graduate Texts in Mathematics Springer-Verlag, New York, 1984, pp. x+289 DOI: 10.1007/978-1-4612-1128-0 · doi ↗
2[BD 14] Rina Foygel Barber and John C. Duchi “Privacy and Statistical Risk: Formalisms and Minimax Bounds” In ar Xiv-preprint, available at https://arxiv.org/abs/1412.4451 v 1 , 2014
3[BL 96] Lawrence D. Brown and Mark G. Low “A constrained risk inequality with applications to nonparametric functional estimation” In Ann. Statist. 24.6 , 1996, pp. 2524–2535 DOI: 10.1214/aos/1032181166 · doi ↗
4[BT 04] Alain Berlinet and Christine Thomas-Agnan “Reproducing kernel Hilbert spaces in probability and statistics” With a preface by Persi Diaconis Kluwer Academic Publishers, Boston, MA, 2004, pp. xxii+355 DOI: 10.1007/978-1-4419-9096-9 · doi ↗
5[But+19] Cristina Butucea, Amandine Dubois, Martin Kroll and Adrien Saumard “Local differential privacy: Elbow effect in optimal density estimation and adaptation over Besov ellipsoids” In ar Xiv-preprint, available at http://arxiv.org/abs/1903.01927 , 2019
6[But 01] Cristina Butucea “Exact adaptive pointwise estimation on Sobolev classes of densities” In ESAIM Probab. Statist. 5 , 2001, pp. 1–31 DOI: 10.1051/ps:2001100 · doi ↗
7[Cav 01] Laurent Cavalier “On the problem of local adaptive estimation in tomography” In Bernoulli 7.1 , 2001, pp. 63–78 DOI: 10.2307/3318602 · doi ↗
8[Com 15] Fabienne Comte “Estimation non-paramétrique” Paris: Spartacus, 2015

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Pointwise adaptive kernel density estimation under local approximate differential privacy

Synopsis.

Key words and phrases:

2010 Mathematics Subject Classification:

1. Introduction

Roadmap of the article

2. Privacy mechanisms

2.1. Definition of approximate differential privacy

Definition 2.1**.**

2.2. Univariate output using Laplace noise

Proposition 2.2** (See [HLM15], Example 5).**

Example 2.3**.**

2.3. Multivariate output

Proposition 2.4**.**

Example 2.5**.**

2.4. From multivariate to functional output

Proposition 2.6**.**

Proposition 2.7** (See [HRW13], Proposition 8).**

Corollary 2.8** (See [HRW13], Corollary 9).**

Example 2.9**.**

Example 2.10**.**

Example 2.11**.**

Example 2.12**.**

Example 2.13**.**

Example 2.14**.**

Example 2.15**.**

2.5. A composition lemma for approximate differential privacy

Lemma 2.16** (Composition lemma for (α,β)(\alpha,\beta)(α,β)-differential privacy).**

3. Private minimax estimation

3.1. Upper bound

Proposition 3.1**.**

3.2. Lower bound

Proposition 3.2**.**

Remark 3.3*.*

4. Adaptation to unknown smoothness

Proposition 4.1**.**

Remark 4.2*.*

Theorem 4.3**.**

Remark 4.4*.*

5. Discussion

Appendix A Proofs of Section 2

A.1. Proof of Proposition 2.2

A.2. Proof of Proposition 2.4

A.3. Proof of Lemma 2.16

Appendix B Proofs of Section 3

B.1. Proof of Proposition 3.1

B.2. Proof of Proposition 3.2

Verification of 1:

Verification of 2:

Verification of 3:

Appendix C Proofs of Section 4

C.1. Proof of Proposition 4.1

C.2. Proof of Theorem 4.3

Appendix D Bernstein inequality

Lemma D.1**.**

Definition 2.1.

Proposition 2.2 (See [HLM15], Example 5).

Example 2.3.

Proposition 2.4.

Example 2.5.

Proposition 2.6.

Proposition 2.7 (See [HRW13], Proposition 8).

Corollary 2.8 (See [HRW13], Corollary 9).

Example 2.9.

Example 2.10.

Example 2.11.

Example 2.12.

Example 2.13.

Example 2.14.

Example 2.15.

Lemma 2.16 (Composition lemma for $(\alpha,\beta)$ -differential privacy).

Proposition 3.1.

Proposition 3.2.

*Remark 3.3**.*

Proposition 4.1.

*Remark 4.2**.*

Theorem 4.3.

*Remark 4.4**.*

Lemma D.1.