Local differential privacy: Elbow effect in optimal density estimation and adaptation over Besov ellipsoids
Cristina Butucea, Amandine Dubois, Martin Kroll, Adrien Saumard

TL;DR
This paper investigates the limits of non-parametric density estimation under local differential privacy constraints, revealing an elbow effect in convergence rates and proposing wavelet-based estimators that achieve near-optimal performance.
Contribution
It introduces a lower bound on estimation rates under local differential privacy and develops wavelet estimators that adaptively attain these bounds across Besov spaces.
Findings
Lower bounds show deterioration of convergence rates due to privacy constraints.
A wavelet estimator attains the lower bound when p ≥ r.
An adaptive wavelet estimator achieves near-optimal rates in all cases.
Abstract
We address the problem of non-parametric density estimation under the additional constraint that only privatised data are allowed to be published and available for inference. For this purpose, we adopt a recent generalisation of classical minimax theory to the framework of local -differential privacy and provide a lower bound on the rate of convergence over Besov spaces under mean integrated -risk. This lower bound is deteriorated compared to the standard setup without privacy, and reveals a twofold elbow effect. In order to fulfil the privacy requirement, we suggest adding suitably scaled Laplace noise to empirical wavelet coefficients. Upper bounds within (at most) a logarithmic factor are derived under the assumption that stays bounded as increases: A linear but non-adaptive wavelet estimator is shown to attain the lower bound whenever $p…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Local differential privacy: Elbow effect in optimal density estimation and adaptation over Besov ellipsoids
Cristina Butucea
Cristina Butucea, CREST, ENSAE, Institut Polytechnique de Paris, 5 avenue Henry Le Chatelier, F-91120 Palaiseau
,
Amandine Dubois
Amandine Dubois, CREST-ENSAI, Campus de Ker-Lann - Rue Blaise Pascal - BP 37203 - 35172 BRUZ cedex
,
Martin Kroll
Martin Kroll, CREST, ENSAE, Institut Polytechnique de Paris, 5 avenue Henry Le Chatelier, F-91120 Palaiseau
and
Adrien Saumard
Adrien Saumard, CREST-ENSAI, Campus de Ker-Lann - Rue Blaise Pascal - BP 37203 - 35172 BRUZ cedex
Abstract.
We address the problem of non-parametric density estimation under the additional constraint that only privatised data are allowed to be published and available for inference. For this purpose, we adopt a recent generalisation of classical minimax theory to the framework of local -differential privacy and provide a lower bound on the rate of convergence over Besov spaces under mean integrated -risk. This lower bound is deteriorated compared to the standard setup without privacy, and reveals a twofold elbow effect. In order to fulfil the privacy requirement, we suggest adding suitably scaled Laplace noise to empirical wavelet coefficients. Upper bounds within (at most) a logarithmic factor are derived under the assumption that stays bounded as increases: A linear but non-adaptive wavelet estimator is shown to attain the lower bound whenever but provides a slower rate of convergence otherwise. An adaptive non-linear wavelet estimator with appropriately chosen smoothing parameters and thresholding is shown to attain the lower bound within a logarithmic factor for all cases.
Key words and phrases:
Density estimation, Besov classes of functions, Local differential privacy, Lower bounds, Minimax rates, Adaptive estimation, Wavelet thresholding
2010 Mathematics Subject Classification:
62G07 (primary), and 62G20 (secondary)
1. Introduction
Problem statement
In the modern information age, increasingly more institutions are collecting and storing data. Provided that a certain amount of privacy is guaranteed, some of these institutions might be willing to provide access to selected data sets. Examples of such data may include information about participants in a medical study, clients of a web service, or persons interviewed in a scientific survey. In this framework, the following questions arise naturally: How can data be sufficiently anonymised, given a rigorous definition of privacy, and what are the consequences for subsequent data analyses resulting from the chosen anonymisation procedure? The answer to these questions depends on several interacting parameters, namely the privacy definition at hand, the potential extent of collaboration of the involved data holding entities, and the kind of data mining tasks that should be feasible based on the private data.
In this paper, we consider the problem of non-parametric density estimation under local differential privacy as a special instance of the general problem sketched in the previous paragraph: For , the -th data holder observes a real-valued random variable distributed according to a probability density function . The aim is that every data holder releases an anonymised view of such that the privacy notion of local differential privacy, that is introduced next, is satisfied and that the density can be estimated from the data in an optimal way.
Local differential private estimation
The notion of local differential privacy aggregates two different concepts, namely local privacy and differential privacy, that we explain in the sequel.
The qualitative notion of local privacy characterises how the different entities holding the data might interact to generate a private release . It is opposed to the concept of global privacy where the respective data holders share confidence in a common curator who has access to the ensemble of non-masked data and generates the releasable data from this complete information. In the local setup, such an authority that is trusted by all the parties, does not exist. However, some amount of interaction between the different parties is still allowed. The releasable data are obtained by successively applying suitable Markov kernels. Given and , the -th dataholder draws
[TABLE]
for some Markov kernel where the measure spaces of the non-private and private data are denoted with and , respectively. An important special case is that of non-interactive local privacy where the random value of depends on only and must not depend on preceding values of . More precisely, in the non-interactive case we have
[TABLE]
for some Markov kernel that does no longer depend on the index . The non-interactive scenario seems to be more attractive in practice since no communication between the data holders is assumed and it is balanced in the sense that no participant obtains any information about any other participant’s data. From a mathematical point of view, however, allowing also non-interactive procedures does not lead to more technical proofs. Thus, we potentially allow non-interactive methods in our minimax analysis, although the anonymisation techniques proposed in this paper are exclusively non-interactive. Let us mention that for some tasks, however, interactive mechanisms provide natural and attractive alternatives (for instance, for private estimation in generalized linear models; see [DJW18], Section 5.2.1).
The notion of differential privacy is a quantitative one and introduces a condition that makes the problem at hand mathematically tractable. We provide its definition for the locally private case only and refer the reader to [WZ10] for a definition in the global case.
Definition 1.1**.**
A sequence of Markov kernels provides -differential privacy if
[TABLE]
In the non-interactive case, this condition is replaced with
[TABLE]
We denote with the set of all local -differential private Markov kernels.
Thus, the parameter quantifies the amount of privacy that is guaranteed: Setting ensures perfect privacy whereas letting tend to infinity softens the privacy restriction. In the non-interactive case, let us suppose that the Markov kernel has a density with respect to some dominating measure. Then, the defining property of -differential privacy is equivalent to
[TABLE]
A consequence from the definition of -differential privacy is plausible deniability of the data in the following sense: Given the private view only, the power of any test of the null hypothesis against the alternative with prescribed first error probability has power bounded from above by (see [WZ10], Theorem 2.4).
Rate optimal density estimation over Besov ellipsoids
Let us briefly review some well-known results on non-parametric density estimation in the non-private setup where can be observed. This classical model provides a natural benchmark for the model where additional privacy restrictions are imposed, and having in mind the results for this benchmark model turns out to be useful for understanding the ones for the model with privacy.
Density estimation from a sample of observations is one of the paradigmatic problems in non-parametric statistics. A popular framework is that of minimax optimal estimation: Given a loss function (that is, a function mapping a pair of density functions to some non-negative real number) and any class of candidate density functions, the quantity of interest is the minimax risk
[TABLE]
where the infimum is taken over all estimators (that is, -measurable functions). In this setup, an estimator is called rate optimal if
[TABLE]
Several function classes, loss functions and types of estimators have been intensively studied for the density estimation problem (see [Tsy09] and [GN16] for comprehensive overviews of the topic). Throughout this paper, we consider the integrated risk associated to -loss defined by for . For the Besov spaces to be considered in the sequel, wavelet methods have turned out particularly convenient. Given a father wavelet and a mother wavelet associated to it, verifying some sufficient conditions (see conditions (5.10)–(5.12) in [Här+98]), and an integer , a wavelet basis of is given by
[TABLE]
Given such a basis, the probability density admits the following formal expansion (in sense):
[TABLE]
where the wavelet coefficients are defined as
[TABLE]
An attractive property of wavelet expansions as (1.4) is that the membership of Besov spaces can be characterised in terms of its wavelet coefficients with respect to a well chosen wavelet basis. In the sequel, we will work under the following assumption on the father wavelet .
Assumption 1.1**.**
Following [Här+98], we assume that the father wavelet function generates a multiresolution analysis of , that it is times weakly differentiable for some integer such that , and that its derivative satisfies a.e. Moreover, we assume that there exists a bounded, non-increasing function on such that and that both and .
If the father wavelet function verifies Assumption 1.1 then, given parameters and , the fact that belongs to the Besov space is equivalent to where
[TABLE]
for and the usual modification if . Fixing such a wavelet basis, we consider Besov ellipsoids defined as
[TABLE]
Since our interest is in density estimation, a quite natural class to consider is
[TABLE]
where denotes the support of the function . Note that we consider here the Besov smoothness of as a function defined on the whole real line, or, equivalently, that belongs to a periodic Besov class. It would equally be possible to define Besov smoothness over the support . Then the wavelet basis has to be boundary corrected so that it detects the smoothness on this interval only and not the potential lack of smoothness of at its boundary. We refer the reader to [GN16] for boundary corrected wavelets, that also dispose of all the properties that we need in the sequel.
It is well-known [GN16, Här+98, Don+96] that
[TABLE]
and these rates are optimal or suboptimal by a logarithmic factor only (see [Här+98] for an extensive discussion). The structural change of the rate between dense zone (where ) and sparse zone (where ) is sometimes called an elbow effect.
Moreover, in the dense case, we can distinguish the homogeneous zone when and the non-homogeneous zone where . In the homogeneous case, linear wavelet estimators of the form
[TABLE]
with , , and appropriately chosen are rate optimal whereas linear procedures are necessarily sub-optimal in the non-homogeneous case (see [Här+98] and references therein). In this latter scenario as well as in the sparse case, non-linear estimators based on wavelet thresholding turn out to be optimal at least up to logarithmic factors.
Minimax framework under privacy constraints
Let us now describe how to extend the classical minimax setup in order to encompass the framework of local differential privacy. Since not only the estimation procedure but also the Markov kernels guaranteeing local -differential privacy can freely be chosen, it is natural to replace (1.2) with the local -differential minimax risk defined as
[TABLE]
Here the infimum is taken both over all -measurable estimators of and all Markov kernels guaranteeing local -differential privacy. A tuple consisting of a privacy mechanism and an estimator is rate optimal (with respect to the local -differential private risk) if
[TABLE]
The quantity as well as the construction of optimal privacy mechanism and estimators represent the principal interest of the rest of the paper.
Related work
Research on statistical estimation under privacy constraints is rather recent. A landmark paper is [WZ10] where research on the subject has been initiated and density estimation via histograms and orthogonal series in the global privacy setup have been discussed. In the same global framework, the article [HRW13] considers anonymization of functional data and discusses kernel density estimators as the main example. Local -differential privacy was intensively studied in [DJW13] and the companion article [DJW18]. In [DJW13] the authors show that the well-known technique of randomized response from survey statistics can be interpreted under the umbrella of local -differential privacy. In the context of density estimation, [DJW13] established minimax rates of convergence for the mean integrated squared error over Sobolev classes with arbitrary smoothness parameter . They establish the minimax rate of order for the mean integrated squared error over Sobolev classes with and show that this optimal rate can be attained by Laplace perturbation of empirical histogram coefficients. The papers [DJW13, DJW18] provide also results for Sobolev classes with higher degrees of smoothness () but in this case a mere perturbation of the empirical Fourier coefficients does not lead to a rate optimal method (see [DJW13], Observation 1 for the non-optimality of this approach). By means of a more sophisticated sampling technique (see [DJW13], p. 11 or [DJW18], Section 5.2.2), however, the authors derive the minimax rate of convergence that is also in the general case. Furthermore, [DJW13] provides private versions of classical information-theoretical bounds that allow to apply standard lower bound techniques also in the private setup. In [RS18], the estimation of linear functionals in the framework of local privacy is considered and a characterisation of the rates of convergence in terms of moduli of continuity is obtained which is in parallel to well-known results for the non-private setup [DL91]. This general analysis contains the private estimation of a probability density at a fixed point under mean squared error as a special case.
Main results
In Section 2, in addition and in formal analogy to (1.5), we derive, under similar technical assumptions, the following lower bound on the private minimax risk:
[TABLE]
This lower bound is complemented by corresponding upper bound results: The anonymisation technique used to create the private views of the non-releasable data consists in an appropriately scaled version of the classical Laplace mechanism applied on the empirical wavelet coefficients (Section 3). The wavelet estimators considered in Sections 4 and 5 are based on the availability of the privatised data only. As in the non-private case, a linear wavelet estimator attains the given rate in the homogeneous case, that is, whenever (Section 4). In Section 5, we study non-linear estimators and show that an estimator using hard thresholding can nearly attain the lower bounds both in the dense and in the sparse zone.
Notational conventions
For real numbers we write . We denote with a generic constant that might change with every appearance. For two sequences , we denote by that there exist some constant and a fixed integer number such that , for all . We say that , if both and . If , we denote by the fact that as . We recall that a centred Laplace distribution with parameter has the probability density function , for all real number . In particular, if , then for all .
2. Lower bounds
The purpose of this section is to derive (1.6) and hence providing an analogue of (1.5) under local -differential privacy. To this purpose, we proceed in two steps. The first lower bound, given in Proposition 2.1, is stronger in the private dense zone (), whereas the second one, given in Proposition 2.2, dominates in the private sparse zone where . An essential tool for both proofs is a strong information theoretical inequality (our Proposition A.1) proved in [DJW18], which states a bound for the Kullback-Leibler divergence between any distributions that have been processed through an arbitrary channel guaranteeing local -differential privacy. We begin with the lower bound that is dominating in the dense zone.
Proposition 2.1**.**
Let and let . Then,
[TABLE]
where the infimum is taken over all estimators based on the private views and all Markov kernels guaranteeing local -differential privacy.
The proof of Proposition 2.1 is based on a reduction of the class to a finite number of hypotheses indexed by the vertices of a hypercube of suitable dimension. It is given in Section A.1 in the appendix.
The following proposition complements Proposition 2.1 in stating a lower bound that is stronger in the private sparse zone.
Proposition 2.2**.**
Let . Let , and let . Then,
[TABLE]
where the infimum is taken over all estimators based on the private views and all channels providing local -differential privacy.
The proof of Proposition 2.2 is given in Section A.2 in the appendix.
Taking the maximum of the lower bounds obtained in Propositions 2.1 and 2.2 yields (1.6). In addition to our novel lower bounds, the known bounds (1.5) from the non-private framework still hold true under local -differential privacy since processing the original data through a privacy mechanism can be interpreted equivalently as imposing a restriction on the set of admissible estimators in (1.2). More precisely, the constraint of local -differential privacy confines the set of potential estimators to those of the form where and is any measurable function. Thus,
[TABLE]
where the quantity is defined in (1.5). Hence, the following corollary holds.
Corollary 2.1**.**
Let the assumptions of Propositions 2.1 and 2.2 hold true. Then,
[TABLE]
Note that the frontier between the dense and the sparse zone in the private framework is different from the one in the non-private framework leading to a partition into three regimes for the lower bound and a twofold elbow effect. Note that these lower bounds match the upper bounds derived in Section 4 and 5 at most up to logarithmic factors whenever stays bounded as increases. In addition, the bounds from the non-private setup dominate provided that increases sufficiently fast in terms of .
3. Privacy mechanisms
Let us denote with the real-valued random variables that represent the non-private observations held by the different data holders. We assume that for . In particular, the support of the density is contained in the interval . In this section, we introduce a non-interactive privacy mechanism creating a private release based on the non-private sample that satisfies the defining property of -differential privacy. For this purpose, we consider a wavelet basis as in (1.3). We assume in the sequel that the following condition on the parent wavelets is satisfied:
[TABLE]
The idea of the proposed anonymisation technique is to mask the empirical wavelet coefficients and for certain values of . A consequence of (W1) and the compact support of is that for any and any fixed resolution level , the corresponding and can a priori be non-zero for a finite number of only. We denote the set of with potentially non-zero by . Analogously, for , the set of with potentially non-zero is denoted with .
Let us now define two privacy mechanisms that will turn out to be convenient for the purposes of this paper. It will be sufficient to consider from now on.
First privacy mechanism
For , , define
[TABLE]
where are independent Laplace distributed random variables with parameter ,
[TABLE]
for with .
Second privacy mechanism
For , , define
[TABLE]
where are independent Laplace distributed random variables with parameter ,
[TABLE]
for with and some .
Note that both privacy mechanisms in (3.1) and (3.2) are non-interactive because does not depend on for . The following proposition shows that both privacy mechanisms, satisfy the condition of -differential privacy.
Proposition 3.1**.**
The privacy mechanisms given in (3.1) and (3.2) are local -differential private.
Proof.
By definition of the privacy mechanism in (3.1), the conditional density of given can be written as
[TABLE]
Thus, by the reverse and the ordinary triangle inequality,
[TABLE]
Note that for any fixed and arbitrary , holds only for at most different , and the same argument is valid for . Thus,
[TABLE]
For the privacy mechanism (3.2), analogous calculations yield for the conditional density of given that
[TABLE]
where we used that and . ∎
4. Upper bound for linear wavelet estimators
The expansion (1.4) suggests to consider estimators of the form
[TABLE]
with appropriate estimators and of and , respectively. Note that in the local private framework, estimators of the wavelet coefficients are allowed to depend on the private views only but not on the hidden . For the results concerning the linear estimator in this section, it suffices to consider the case . In this case we put and define a linear wavelet estimator through
[TABLE]
Grant to , the definition of is natural and provides an unbiased estimate of the true wavelet coefficient .
The following proposition provides an upper bound for the estimator in the so-called matched case when . Its proof is given in Appendix B.
Proposition 4.1**.**
Assume that the father wavelet satifies Assumption (1.1). Let and defined as in (3.1). Then
[TABLE]
In particular, choosing such that
[TABLE]
we obtain
[TABLE]
The upper bound (4.3) suggests the following interpretation: As long as , the estimator attains the rate known to be optimal when the sample is available. However, as soon as , this standard rate is deteriorated and the slower rate is attained. As in [DJW18], the alteration of the rate in comparison to the non-private framework concerns both the effective sample size (that changes from to ) and the exponent appearing in the rate. In contrast to the procedure suggested in [DJW18], however, the privacy mechanism (3.1) consists in a mere perturbation of the empirical wavelet coefficients by Laplace noise, and no further sampling technique is necessary to obtain a privacy channel enabling rate optimal estimation of .
Although the risk bound of Proposition 4.1 is valid only in the matched case, it can be extended to the case by means of the following proposition. Its proof is given in Appendix B.
Corollary 4.1**.**
Assume that the father wavelet satifies Assumption (1.1). Let and defined as in (3.1), and put by . Then, choosing as in (4.2) yields
[TABLE]
Corollary 4.1 together with Proposition 2.1 shows that the estimator is of optimal order in the dense homogeneous zone where (which is equivalent to ) and for in . In analogy to [Don+96], it would be possible to suggest a non-linear estimation procedure depending on that is optimal (up to logarithmic factors in some cases) in the non-homogeneous dense case and in the sparse case as well. However, in Section 5, we directly propose a non-linear estimator that is adaptive to the smoothness of the underlying density (as well as to the other parameters and of the Besov space).
5. Upper bounds for the non-linear adaptive estimator
In this section, the privacy mechanism is given by (3.2) in Section 3. We study the theoretical properties of the non-linear wavelet estimators of the form
[TABLE]
where
[TABLE]
and as in Section 4 (the choice of and the value of the numerical constant are specified in Theorem 5.1 and its proof below). Thus, non-linearity enters only with respect to the estimation of the detail coefficients .
Theorem 5.1**.**
Let the father wavelet satisfy Assumption 1.1 for some integer . Let the private views of the sample be generated with the privacy mechanism in (3.2). Consider the estimator defined in (5.1) with
* such that ,*
* where , are such that*
[TABLE]
* for some and with introduced in the definition of the second privacy mechanism,*
* for and some sufficiently large constant (for instance, works).*
Then, the risk bound
[TABLE]
where
[TABLE]
and where
[TABLE]
for some .
The proof of the Theorem is given in Appendix C. Note that both the privacy mechanism and the estimator in Theorem 5.1 are independent of the quantities , , , and . Hence, the proposed procedure is adaptive.
6. Discussion
In this article, we have suggested refined methods for density estimation under the constraint of local -differential privacy. By the use of estimators based on wavelet expansions, we have been able to obtain adaptive procedures that obtain the minimax rate of convergence up to an additional logarithmic factor only. To the best of our knowledge, adaptation to smoothness has not been considered in the framework of private estimation so far. Moreover, in allowing for general -risk and Besov ellipsoids we have widened the range of results in the privacy framework that has merely focused on -risk and Sobolev ellipsoids until now.
A significant difference between our approach and the one suggested in Section 5.2.2 of [DJW18] concerns the privacy mechanism: Whereas the procedure in [DJW18] is built on a rather sophisticated sampling strategy aiming at the perturbation of empirical Fourier coefficients, our privacy mechanism consists in a simple Laplace perturbation of empirical wavelet coefficients. In [DJW18] it has been observed (see the last paragraph of Section 5.2.2 in that paper) that such an approach is not feasible for the Fourier basis since it would lead to a suboptimal rate (under -risk) of order over Sobolev ellipsoids of smoothness instead of the optimal rate . A heuristic explanation for the easier accessibility of the problem by means of wavelet bases is given by their well-known localisation properties in contrast to the global Fourier basis.
Note that wavelet methods in the non-private framework do not necessarily suffer from a logarithmic loss in the rate (see, for instance, [Don+96] where an additional logarithmic loss only appears in the dense zone). The fact that we encounter this type of loss in our private scenario is caused be the term in the definition of the privacy mechanism (3.2) and might be explained by the pointwise nature of the -differential privacy constraint. The problem whether and if so, how such logarithmic losses might be circumvented remains open and provides an interesting direction for future research.
Finally, let us sketch the connection between local private estimation in the non-interactive setup and statistical inverse problems, in particular, density deconvolution: On the one hand, in density deconvolution, the statistician is given a noisy sample where for and . Here, the density is the quantity of interest and an error density which is (at least in the overwhelming part of the literature) supposed to be known. In this setup, the are distributed according to the density where
[TABLE]
is the convolution of with the error density . It is well-known that the difficulty of reconstructing from the sample is linked with the degree of ill-posedness of the inverse problem . The latter can be described either in terms of the sequence of eigenvalues of ( denotes the adjoint operator of the linear operator ) or in terms of the decay of the Fourier transform of the error density . General inverse problems of the form have been thoroughly investigated in [Ker+07] in the framework of a Gaussian white noise model. For Besov smooth signals and for some , [Ker+07] derived adaptive rates of estimation of proportional to
[TABLE]
On the other hand, the statistician who is given the non-interactive privatised sample is confronted with the problem of recovering from a sample from the mixture density
[TABLE]
which is a special instance of an inverse problem and strongly resembles (6.1). In contrast to (6.1), however, the operator is now not a priori given as a component of the problem but constitutes rather a part of its solution. In the local differential privacy framework, the statistician should select the operator , corresponding to the choice of a privacy mechanism, subject to the two following constraints. First, the condition (1.1) concerning -differential privacy must hold. Second, the least possible amount of information should be smoothed out by the operator . More precisely, denoting with the degree of ill-posedness as above, the proofs of the lower bounds suggest that the least admissible value for is . Our privacy mechanisms, that is, our choices of satisfy both constraints by leading to an overall estimation procedure that is nearly minimax. We emphasize that the above interpretation of the local differential private estimation problem does not rule out privacy mechanisms that add noise directly to the random variables in principle. As already mentioned, [DJW18] have noted that adding Laplace noise directly to the observations cannot lead to an optimal procedure. Indeed, the convolution operator in this case has degree of ill-posedness corresponding to which yields a too slow rate.
Appendix A Proofs of Section 2
We distinguish in the sequel the dense case and the sparse case that require different explicit constructions. However, for both proofs of the lower bounds we need the existence of a function with the following properties (see [Här+98]):
is a probability density,
,
,
on some interval .
In particular, .
The main tool in the proof of the lower bounds is adapted from [DJW18]. It allows to reduce the problem to the study of the likelihoods of the non-privatized data and quantifies the loss of information in the process.
Suppose that we are given a finite indexed family of distributions . Let denote a random variable that is uniformly distributed over . Conditionally on , suppose we sample a random vector according to the product measure . Suppose that we draw an -locally private sample according to a channel . Conditioned on , is distributed according to the measure given by
[TABLE]
where denotes the joint distribution on of the private sample conditioned on . In this setup, we have the following inequality.
Lemma A.1**.**
[Based on [DJW18], Theorem 1] Let . For any -locally differentially private conditional distribution and any , , we have in the above setting
[TABLE]
Lemma A.1 quantifies the property that -differential privacy acts as a contraction on the space of probability measures.
A.1. Proof of Proposition 2.1
It is sufficient to prove the lower bound for sufficiently large (the remainining finitely many might merely further reduce the value of the numerical constant ). Let be the function introduced above. For fixed (the choice of which will be specified later) define as the maximal subset of such that and if with . Note that . Define
[TABLE]
where for sufficiently small and . For sufficiently small, it holds , which ensures that is non-negative for all . One can easily check that and for all . Moreover, by the definition of , the choice of and the equivalence of norms, we have
[TABLE]
where the last inequality holds for sufficiently small. Hence, and
[TABLE]
Denoting by the support of , it holds for any estimator of that
[TABLE]
since on . Set
[TABLE]
and . It follows from the triangle inequality that
[TABLE]
Thus,
[TABLE]
where denotes the Hamming distance. Therefore,
[TABLE]
In order to apply Lemma A.2, we need to bound the Kullback-Leibler divergence between two different distributions and of the private sample resulting from the sample if, for all , is distributed according to , with . We write if has density . Using Lemma A.1 we obtain for any channel providing local -differential privacy that
[TABLE]
Now, since and , there exists such that
[TABLE]
which implies that
[TABLE]
Applying Lemma A.2 from the appendix with implies
[TABLE]
This implies the statement of the proposition since and the channel distribution were arbitrary.
A.2. Proof of Proposition 2.2
We consider and as in the proof of Proposition 2.1, but consider now the set
[TABLE]
where is chosen such that 2^{j}\simeq\big{(}\frac{n(e^{\alpha}-1)^{2}}{\log(n(e^{\alpha}-1)^{2})}\big{)}^{\frac{1}{2(s+1-1/p)}} and for sufficiently small. Let us first check that this choice of and guarantees that . First, we have and one can easily check that and for all . Then, for any , we have
[TABLE]
for sufficiently small. Furthermore, for any ,
[TABLE]
for sufficiently small. Hence, and
[TABLE]
Now, we show that for , , the hypotheses and , as well as the hypotheses and , are sufficiently separated in the sense of Lemma A.3. For such we have:
[TABLE]
For , let be the distribution of the private sample resulting from the sample if for all is distributed according to . For all we have . It remains to bound the quantity We write if has density , . First consider the total variation distance between and for :
[TABLE]
and thus
[TABLE]
Applying Lemma A.1 gives
[TABLE]
Now, and
[TABLE]
for sufficiently large, say . Putting this estimate into (A.1) yields
[TABLE]
for and for sufficiently small. We can then apply Lemma A.3, which yields for that
[TABLE]
The statement of the proposition follows since both the estimator and the privacy mechanism considered were arbitrary.
A.3. Further auxiliary results for the lower bound proofs
The following lemma is a Kullback-Leibler version of Assouad’s lemma. As above, we denote by the Hamming distance, that is, for .
Lemma A.2** ([Tsy09], p. 118, Theorem 2.12).**
Denote with the set of all binary sequences of length . Let be a set of probability measures on some measurable space and let the corresponding expectations be denoted by . Then
[TABLE]
provided that for all with .
For the lower bound in the sparse case we need the following lemma taken from [Tsy09].
Lemma A.3** ([Tsy09], p. 101, Theorem 2.7).**
Assume that and suppose that contains elements such that:
- (i)
, for all , 2. (ii)
, for all , and
[TABLE]
with and , . Then
[TABLE]
Appendix B Proofs of Section 4
B.1. Proof of Proposition 4.1
We give the proof for only, which is based on Statement (ii) from Lemma B.1. The proof for follows similarly using (i) instead. We decompose the risk of the estimator into approximation and stochastic error:
[TABLE]
The approximation term can be dealt with exactly as in the case of non-private data (see [Här+98], p. 130),
[TABLE]
and it remains to consider the stochastic term. Putting and , we have
[TABLE]
which can be rewritten as
[TABLE]
where . We further decompose
[TABLE]
The first term on the right-hand side is analysed as in the non-private setup (see [Här+98], p. 130) leading to the bound
[TABLE]
For the remaining term, we have by Tonelli’s theorem
[TABLE]
where is some compact set the length of which depends on and only. The expectation inside the integral is bounded from above by Rosenthal’s inequality (Statement (ii) of Lemma B.1):
[TABLE]
Recall the definition of and noting that grant to the boundedness of the support of the wavelet parents and we have for any and fixed that only for a finite number of that is independent of . Thus, using the last expression we bound from above as follows
[TABLE]
Thus,
[TABLE]
Combining (B.1) and (B.2) yields
[TABLE]
which proves (4.1). Choosing as in (4.2) immediately implies (4.3).
B.2. Proof of Corollary 4.1
We distinguish between the cases and .
1. Case:
In this case, . Let us consider the estimator with chosen as in Proposition 4.1. First note that there exists a constant such that the Lebesgue measure of is bounded from above by a constant . Then, applying Hölder’s inequality and Proposition 4.1 yields
[TABLE]
2. Case:
In this case, . Grant to the Besov embedding it holds , which implies . Thus, again using the upper bound for the matched case from Proposition 4.1,
[TABLE]
which is the desired bound for the case .
B.3. Inequalities for moments of sums of independent random variables
Lemma B.1**.**
Let be independent centred random variables with .
- (i)
If , then
[TABLE] 2. (ii)
If , then there exists a constant such that
[TABLE]
Inequality (i) follows directly from Jensen’s inequality and concavity of for . For a proof of inequality (ii) we refer to [Pet95], p. 59, Theorem 2.9.
Appendix C Proof of Theorem 5.1
This section is devoted to the proof of Theorem 5.1. The main reasoning is given in Section C.1 but some tedious calculations for this proof are deferred to Section C.2. Sections C.3 and C.4 contain auxiliary results used in Section C.2.
C.1. Proof of Theorem 5.1
As in the proof of the Corollary 4.1, we note that it is sufficient to prove the result for and one can deduce the result for as in the proof of this corollary.
We consider the upper bound where
[TABLE]
We consider the risk bounds for , , and separately.
Upper bound for the term :
Putting it holds
[TABLE]
The first term on the right-hand side is bounded by the compact support assumption on and using Lemma 1 from [Don+96] as in the non-private case (see [Don+96], p. 522):
[TABLE]
Concerning the second term, first, by Fubini’s theorem
[TABLE]
and the integrand on the right-hand side can be bounded as follows: for ,
[TABLE]
whereas for ,
[TABLE]
Thus, altogether,
[TABLE]
Hence, for our choice of and grant to from Assumption 1.1, we obtain
[TABLE]
and the bound on the right-hand side is the claimed rate.
Upper bound for the term :
We consider the sets
[TABLE]
and the decomposition
[TABLE]
Appropriate bounds for the four terms are derived in Appendix C.2.
Upper bound for the term :
In the case we consider, , we use the embedding , where we recall that . Then, it holds
[TABLE]
Moreover, with our choice of ,
[TABLE]
and the sum on the right-hand side is bounded from above by the claimed rate.
C.2. Bounds for the terms , , , and
Consider the event defined via . The concentration inequality (C.5) for this event as well as the bound (C.6) will be used frequently in the sequel without further reference. In the following, we bound the terms , , , and separately.
C.2.1. Bound for
By the Cauchy-Schwarz inequality and the fact that ,
[TABLE]
and this term is bounded from above by the claimed rate provided that .
C.2.2. Bound for
Using the relation , we obtain
[TABLE]
In the considered case , we exploit the embedding with to get the bound
[TABLE]
Hence,
[TABLE]
by the definition of . Noting that
[TABLE]
provided that is large enough ( is sufficient), shows that is at most of the same order as the claimed rate.
C.2.3. Bound for
Put and . Note that . For any , it holds
[TABLE]
As this argument shows, one can even choose distinct values of for different , which will be used in the following calculations. Note that
[TABLE]
and, if , by Hölder’s inequality
[TABLE]
In the sequel, we consider three different cases corresponding to the three regimes in the statement of Theorem 5.1.
1. Case:
Bound for : Set and define such that
[TABLE]
Choosing for the indices , we obtain (note that )
[TABLE]
Choosing for indices , we obtain
[TABLE]
Bound for : Set and define such that
[TABLE]
Choosing for the indices , we obtain (note that )
[TABLE]
Choosing for indices , we obtain
[TABLE]
2. Case:
Bound for : The sum can be dealt with as in the first case, since the choices of and from that case are still legitimated for .
Bound for : In order to bound in the second case, define and via the relations
[TABLE]
To deal with the sum over , we take and obtain
[TABLE]
For the sum over indices , we choose , and obtain by monotony of -norms in , and putting that
[TABLE]
3. Case:
Bound for : Put
[TABLE]
and choose such that
[TABLE]
Then, taking for the indices in the first sum in (C.1), we obtain
[TABLE]
For the sum over indices , we choose , and obtain by monotony of -norms in and putting that
[TABLE]
Bound for : can be dealt with exactly as in the second case.
C.2.4. Bound for
For any
[TABLE]
This term can be bounded from above by the right-hand side of (C.1), and we conclude in the same way as for the term .
C.3. A concentration inequality for the
For our proof, we need concentration inequalities for the events
[TABLE]
for , where and . Let recall the two-sided Bernstein’s inequality (cf. [BLM13] Theorem 2.10).
Theorem C.1**.**
Let be independent real valued random variables. Assume that there exist some positive numbers and such that
[TABLE]
and for all integers
[TABLE]
Let , then for every positive
[TABLE]
Using this inequality, we can prove the following result.
Proposition C.1**.**
For all satisfying , for all , and for all we have
[TABLE]
where is an upper bound for and appears in the privacy mechanism (3.2).
Remark C.2*.*
By Equation (15) in [Don+96], the choice is admissible for .
Proof.
We will apply Bernstein’s inequality to the random variables . Using that and are independent and that , we get for all
[TABLE]
where depends on is such that for all in with . Let be an integer. Using again that and are independent we get for all
[TABLE]
Conditions (C.2) and (C.3) are thus satisfied with and , and according to Bernstein’s inequality (C.4) we have for all
[TABLE]
Note that we have for all ,
[TABLE]
where appears in the definition of in (3.2). Take , and note that if . Consequently, we get for all , for all satisfying and for all ,
[TABLE]
Then, it suffices to take to obtain (C.5). ∎
C.4. Moment bounds and norm inequalities
Consider an arbitrary random function
[TABLE]
Putting
[TABLE]
it has been shown in [Don+96] that for arbitrary and it holds
[TABLE]
As in [Don+96], adopting the formal convention , it suffices to consider the second inequality for all (setting for the case ). Thus, for any ,
[TABLE]
Consider again the decomposition . We have, for any ,
[TABLE]
In [Don+96], p. 520, Equation (16) it is shown that
[TABLE]
provided that with a constant depending only on , , , , and . In addition, by Rosenthal’s inequality, it can be shown for any that
[TABLE]
Combining (C.7) and (C.8) yields
[TABLE]
Acknowledgements
The authors gratefully acknowledge financial support from GENES. Cristina Butucea and Martin Kroll also gratefully acknowledge financial support from the French National Research Agency (ANR) under the grant Labex Ecodec (ANR-11-LABEX-0047).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[BLM 13] Stéphane Boucheron, Gábor Lugosi and Pascal Massart “Concentration inequalities” A nonasymptotic theory of independence, With a foreword by Michel Ledoux Oxford University Press, Oxford, 2013, pp. x+481 DOI: 10.1093/acprof:oso/9780199535255.001.0001 · doi ↗
- 2[DJW 13] John C. Duchi, Michael I. Jordan and Martin J. Wainwright “Local privacy and minimax bounds: sharp rates for probability estimation”, 2013 URL: https://arxiv.org/abs/1305.6000
- 3[DJW 18] John C. Duchi, Michael I. Jordan and Martin J. Wainwright “Minimax optimal procedures for locally private estimation” In J. Amer. Statist. Assoc. 113.521 , 2018, pp. 182–201 DOI: 10.1080/01621459.2017.1389735 · doi ↗
- 4[DL 91] David L. Donoho and Richard C. Liu “Geometrizing rates of convergence. II, III” In Ann. Statist. 19.2 , 1991, pp. 633–667 \bibrangessep 668–701 DOI: 10.1214/aos/1176348114 · doi ↗
- 5[Don+96] David L. Donoho, Iain M. Johnstone, Gérard Kerkyacharian and Dominique Picard “Density estimation by wavelet thresholding” In Ann. Statist. 24.2 , 1996, pp. 508–539 DOI: 10.1214/aos/1032894451 · doi ↗
- 6[GN 16] Evarist Giné and Richard Nickl “Mathematical foundations of infinite-dimensional statistical models”, Cambridge Series in Statistical and Probabilistic Mathematics, [40] Cambridge University Press, New York, 2016, pp. xiv+690 DOI: 10.1017/CBO 9781107337862 · doi ↗
- 7[Här+98] Wolfgang Härdle, Gerard Kerkyacharian, Dominique Picard and Alexander Tsybakov “Wavelets, approximation, and statistical applications” 129 , Lecture Notes in Statistics Springer-Verlag, New York, 1998, pp. xviii+265 DOI: 10.1007/978-1-4612-2222-4 · doi ↗
- 8[HRW 13] Rob Hall, Alessandro Rinaldo and Larry Wasserman “Differential privacy for functions and functional data” In J. Mach. Learn. Res. 14 , 2013, pp. 703–727
