This paper develops conditions under which estimators of the Kullback-Leibler divergence, based on k-nearest neighbor statistics, are asymptotically unbiased and consistent for probability measures in R^d, including Gaussian measures.
Contribution
It introduces new asymptotic unbiasedness and consistency results for Kullback-Leibler divergence estimators using k-nearest neighbor methods, applicable to Gaussian measures.
Findings
01
Estimates are asymptotically unbiased under wide conditions.
02
Estimates are L^2-consistent for a broad class of probability measures.
03
New results on Kozachenko-Leonenko entropy estimators are derived.
Abstract
Wide conditions are provided to guarantee asymptotic unbiasedness and L^2-consistency of the introduced estimates of the Kullback-Leibler divergence for probability measures in R^d having densities w.r.t. the Lebesgue measure. These estimates are constructed by means of two independent collections of i.i.d. observations and involve the specified k-nearest neighbor statistics. In particular, the established results are valid for estimates of the Kullback-Leibler divergence between any two Gaussian measures in R^d with nondegenerate covariance matrices. As a byproduct we obtain new statements concerning the Kozachenko-Leonenko estimators of the Shannon differential entropy.
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
*Dept. of Mathematics and Mechanics, Lomonosov Moscow State University,
Moscow 119234, Russia*
Abstract Wide conditions are provided to guarantee asymptotic unbiasedness and L2-consistency of the introduced estimates of the Kullback - Leibler divergence for probability measures in Rd having densities w.r.t. the Lebesgue measure. These estimates are constructed by means of two independent collections of i.i.d. observations and
involve the specified k-nearest neighbor statistics. In particular, the established results are valid
for estimates of the Kullback - Leibler divergence between any two Gaussian measures in Rd with nondegenerate covariance matrices.
As a byproduct we obtain new statements concerning the Kozachenko-Leonenko estimators of the
Shannon differential entropy.
The Kullback - Leibler divergence
plays important role in various domains such as
statistical inference
(see, e.g., [25], [28]), machine learning ([5], [32]), computer vision ([11], [13]), network security ([23], [44]), feature selection and classification ([22], [29], [41]), physics ([17]), biology
([9]), finance ([45]), among others.
Recall that this divergence measure between probabilities P and Q on a space (S,B)
is defined by way of
[TABLE]
where dQdP stands for the Radon-Nikodym derivative. Otherwise, D(P∣∣Q):=+∞.
We employ the base e of logarithms (a constant factor is not essential here).
It is worth to emphasize that mutual information, widely used in many research directions, is a special case of
the Kullback -Leibler divergence for certain measures.
For comparison of various f-divergence measures see [34].
If (S,B)=(Rd,B(Rd)) and (absolutely continuous) P and Q have densities, p(x) and q(x), x∈Rd, w.r.t. the Lebesgue measure μ, then (1.1) can be rewritten as
[TABLE]
otherwise, D(P∣∣Q)=+∞.
To simplify notation we write dx instead of μ(dx). We formally set
0/0:=0,
0⋅log0:=0.
For a (version of) probability density f denote by
S(f):={x∈Rd:f(x)>0} its support. Clearly, the integral in (1.2)
is taken over S(p).
Observe that when P≪μ and Q≪μ then P≪Q if and only if P(S(p)∖S(q))=0.
Formula (1.2) is closely related to
cross-entropy and the Shannon differential entropy.
Usually one has to reconstruct the measures (describing a stochastic model under consideration) or their characteristics using some collections of observations. In the pioneering paper [19] the estimator of the Shannon differential entropy was proposed, based on the nearest neighbor statistics. In a series of papers this estimate was
studied and applied. Moreover, estimators of the Rényi entropy, mutual information and the Kullback - Leibler divergence have appeared (see, e.g., [20], [21], [42]).
However, the authors of [27] indicated the occurrence of gaps in the known proofs concerning the limit behavior of such statistics. This issue has attracted our attention and motivated our study of the declared asymptotic properties. Thus in a recent work [7] the new functionals were introduced to prove asymptotic unbiasedness and L2-consistency of the Kozachenko - Leonenko estimators of the Shannon differential entropy.
The present paper is aimed at extension of our approach to grasp the Kullback - Leibler divergence estimation. Instead of the nearest neighbor statistics we employ the k-nearest neighbor statistics (on order statistics see, e.g., [3]) and also use more general forms of the mentioned functionals.
Let X and Y be random vectors taking values in Rd and having distributions PX and PY, respectively (further we consider P=PX and Q=PY). Consider i.i.d. random vectors X1,X2,…, and i.i.d. random vectors Y1,Y2,…, with law(X1)=law(X) and law(Y1)=law(Y). Assume that {Xi,Yi,i∈N} are independent. We are interested in statistical estimation
of D(PX∣∣PY) constructed by means of observations Xn:={X1,…,Xn} and Ym:={Y1,…,Ym}, n,m∈N.
All random variables under consideration are defined on a complete probability space (Ω,F,P).
For a finite set E={z1,…,zN}⊂Rd, where zi=zj(i=j),
and a vector v∈Rd, renumerate points of E as z(1)(v),…,z(N)(v) in such a way that ∥v−z(1)∥≤…≤∥v−z(N)∥, here ∥⋅∥ is the Euclidean norm in
Rd. If there are points zi1,…,zis having the same distance from v
then we numerate them according the increasing indexes among i1,…,is.
In other words, for k=1,…,N, z(k)(v) is the k-NN (Nearest Neighbor) for v
in a set E. To indicate that z(k)(v) is constructed by means of E we write z(k)(v,E).
Fix k∈{1,…,n−1}, l∈{1,…,m} and (for each ω∈Ω) put
[TABLE]
We assume that X and Y have densities p=dμdPX and q=dμdPY. Then with probability one all points in Xn are distinct as well as points of Ym.
Introduce an estimate of D(PX∣∣PY), for n≥k+1 and m≥l, letting
[TABLE]
Here ψ(t)=dtdlogΓ(t)=Γ(t)Γ′(t) is the digamma function, t>0.
All our results will be valid for the following generalization of statistics Dn,m(k,l):
[TABLE]
where Kn:={ki}i=1n, Ln:={li}i=1n and, for some r∈N and all i∈N, ki≤r, li≤r. Note that (1.4) is well-defined for n≥maxi=1,…,nki+1, m≥maxi=1,…,nli. We will only consider the estimates (1.3) since the study of Dn,m(Kn,Ln) follows the same lines.
Developing the approach of [7] to analysis of asymptotic behavior of the Kozachenko-Leonenko estimates of the Shannon differential entropy (introduced in [35], Part III, Section 20) we encounter new complications due to dealing with k-nearest neighbor statistics for k∈N (not only for
k=1). Accordingly, in the framework of the Kullback-Leibler divergence estimation, we propose a new way to bound the function 1−Fm,l,x(u) playing the key role in the proofs (see formula (3.10)). Also instead of the function G(t)=tlogt (for t>1), used in [7] for study of the Shannon entropy estimates, we employ a regularly varying function
GN(t)=tlog[N](t) where (for t large enough) log[N](t) is the N-fold iteration of the logarithmic function and N∈N is chosen arbitrarily. Whence in
the definition of integral functional Kp,q(ν,N,t) by formula (2.4) below one can take a function GN(z) having, for z>0, the growth rate close to that of function z. Moreover, this permits a generalization
of [7] results. Here we invoke convexity of GN
(see Lemma 6) to provide more simple conditions for asymptotic unbiasedness and
L2-consistency of the Shannon differential entropy
than those employed in [7].
Mention in passing that there exist investigations treating other important aspects of the
mutual information and entropy estimation.
In [1] entropy estimators are applied to detection of the fiber materials inhomogeneities. The mixed models and conditional entropy estimation are studied, e.g., in
[8], [10]. The central limit theorem for the Kozachenko-Leonenko estimates is established in [12]. The limit theorems for point processes on manifolds are employed
in [30] to analyze behavior of the Shannon and the Rényi entropy estimates.
The convergence rates for the Shannon entropy (truncated) estimates are obtained in
[40] for one-dimensional case, see also [37] for multidimensional case.
Ensemble estimation of density functional is considered in [38].
A recursive rectilinear partitioning for the differential entropy is considered in
[39]. The mutual information estimation by the local Gaussian approximation is developed in [16].
Note that various deep results (including the central limit theorem) were obtained for the Kullback - Leibler estimates under certain conditions imposed on derivatives of
unknown densities (see, e.g., the recent papers [2], [24], [33]). Our goal is to provide wide conditions for the asymptotic unbiasedness and L2-consistency of the Kullback - Leibler divergence estimates (1.3), as n,m→∞, without such smoothness hypothesis. Also we do not assume that densities have bounded supports.
The paper is organized as follows. In Section 2 we formulate main results, Theorems 1 and 2. Their proofs are presented in
Sections 3 and 4, respectively. Proofs of several lemmas are given in Appendix (Section 5).
2 Main results
Some notation is necessary.
For a probability density f in Rd,
x∈Rd, r>0 and R>0, as in [7], introduce the functions (or functionals depending on parameters)
[TABLE]
[TABLE]
where B(x,r):={y∈Rd:∥x−y∥≤r}.
Observe that changing supr∈(0,R] by supr∈(0,∞)
in the definition of Mf(x,R) leads to the celebrated Hardy - Littlewood maximal
function Mf(x) widely used in harmonic analysis.
Some properties
of the function ∫B(x,r)f(y)dy are considered, e.g., in [14].
According to Lemma 2.1 [7], for a probability density f in Rd, the function If(x,r) defined in (2.1) is continuous in (x,r)∈Rd×(0,∞).
Set e[0]:=1 and e[N]:=exp{e[N−1]}, N∈N.
Introduce a function log[1](t):=logt, t>0.
For N∈N, N>1, set
log[N](t):=log(log[N−1](t)).
Evidently, this function (for N>1) is defined if t>e[N−2].
For N∈N, consider the continuous nondecreasing function GN:R+→R+, given by formula
[TABLE]
For probability densities p,q in Rd, some N∈N and positive constants ν,t,ε,R, we define the following functionals
with values in [0,∞]
[TABLE]
[TABLE]
[TABLE]
Set Kp,q(ν,N):=Kp,q(ν,N,e[N]). Clearly, for any N∈N, ν,t,u>0
such that t<u, one has
[TABLE]
Remark 3
We stipulate that
1/0:=∞ (consequently mq−ε2(x,R):=∞ when mq(x,R)=0).
For arbitrary versions of p and q, we can write in
(2.5), (2.6) the integrals over the support S(p) instead of integrating over Rd (obviously, the results do not depend on the choice of versions).
Theorem 1
Let PX and PY have densities p and q, respectively.
Suppose that p and q are such that, for some εi>0,Ri>0 and Nj∈N, where i=1,2,3,4 and j=1,2, the functionals Kp,q(1,N1), Qp,q(ε1,R1), Tp,q(ε2,R2), Kp,p(1,N2), Qp,p(ε3,R3), Tp,p(ε4,R4) are finite.
Then, for any fixed k,l∈N, the estimates Dn,m(k,l), introduced in (1.3), are asymptotically unbiased, i.e.
[TABLE]
Remark 4
It is useful to note that if Qp,q(ε1,R1)<∞ and Tp,q(ε2,R2)<∞ for some positive ε1,ε2,R1,R2 then ∫Rdp(x)∣logq(x)∣dx<∞.
Indeed, definition (2.2)
and the Lebesgue differentiation theorem (see, e.g., Theorem 25.17 [43])
yield that mq(x,R2)≤q(x)≤Mq(x,R1) for μ-almost all x∈Rd.
Evidently, logz≤ε1zε for any z≥1 and each ε>0. Consequently,
[TABLE]
So, the integrals Qp,q(ε1,R1), Tp,q(ε2,R2), Qp,p(ε3,R3), Tp,p(ε4,R4) finiteness implies
the finiteness of integral in (1.2)
(and also guarantees that PX≪PY).
Lemma 1
Let p and q be any probability densities in Rd.
Then the following statements are valid.
1)* If Kp,q(ν0,N0)<∞ for some ν0>0 and N0∈N then Kp,q(ν,N)<∞ for any ν∈(0,ν0] and each N≥N0.*
2)* If Qp,q(ε1,R1)<∞ for some ε1>0 and R1>0 then Qp,q(ε,R)<∞ for any ε∈(0,ε1] and each R>0.*
3)* If Tp,q(ε2,R2)<∞ for some ε2>0 and R2>0 then Tp,q(ε,R)<∞ for any ε∈(0,ε2] and each R>0.*
The proof is given in Appendix. In view of Lemma 1, one can recast Theorem 1 as follows.
Corollary 1
Let, for some positive ε,R and N∈N, the functionals Kp,q(1,N), Qp,q(ε,R), Tp,q(ε,R), Kp,p(1,N), Qp,p(ε,R), Tp,p(ε,R) be finite.
Then (2.8) holds. Moreover, we obtain the equivalent conditions assuming
that these functionals are finite for some ε>0 and R=ε.
Let us also consider the following simple conditions.
(A;p,q,ν) For probability densities p,q in Rd and some positive ν
[TABLE]
We formally set log0:=−∞ and, as usual,
∫Ag(z)Q(dz)=0 whenever g(z)=∞ (or −∞) for z∈A and Q(A)=0,
where Q is a σ-finite measure on (Rd,B(Rd)).
(B1;f) There exists a version of density f such that, for some M(f)∈(0,∞),
[TABLE]
(C1;f) There exists a version of density f such that, for some m(f)∈(0,∞),
[TABLE]
Corollary 2
Let conditions (A;p,q,ν) and (A;p,p,ν) be satisfied with some ν>1.
Then (2.8) is true, provided that (B1;f) and (C1;f) are valid for f=p and f=q.
Moreover, if the latter assumption concerning (B1;f) and (C1;f) holds then (2.8) is true whenever p and q have bounded supports.
Next we formulate conditions to guarantee L2-consistency of estimates (1.3).
Theorem 2
Let the requirements Kp,q(1,N1)<∞ and
Kp,p(1,N2)<∞ in conditions of Theorem 1 be replaced by Kp,q(2,N1)<∞ and
Kp,p(2,N2)<∞.
Then, for any fixed k,l∈N, the estimates Dn,m(k,l) are L2-consistent, i.e.
[TABLE]
Due to Lemma 1 one can recast Theorem 2 as follows.
Corollary 3
Let, for some positive ε,R and N∈N, the functionals Kp,q(2,N), Qp,q(ε,R), Tp,q(ε,R), Kp,p(2,N), Qp,p(ε,R), Tp,p(ε,R) be finite.
Then (2.10) holds. Moreover, we obtain the equivalent conditions assuming
that these functionals are finite for some ε>0 and R=ε.
Corollary 4
Let conditions (A;p,q,ν) and (A;p,p,ν) be satisfied with some ν>2. Assume that
(B1;f) and (C1;f) are valid for f=p and f=q.
Then (2.10) is true.
Moreover, if the latter assumption concerning (B1;f) and (C1;f) holds then (2.10) is true whenever p and q have bounded supports.
Note that D.Evans considered the “positive density condition”
in Definition 2.1 of [14]
meaning that there exist constants β>1 and δ>0 such that βrd≤∫B(x,r)q(y)dy≤βrd for all 0≤r≤δ and x∈Rd.
Consequently mq(x,δ)≥βVd1:=m>0, x∈Rd. Then Tp,q(ε,δ)≤m−ε∫Rdp(x)dx=m−ε<∞ for all ε>0. Analogously, Mq(x,δ)≤Vdβ:=M, M>0, x∈Rd, and Qp,q(ε,δ)≤Mε∫Rdp(x)dx=Mε<∞ for all ε>0.
It was proved in [15] that if f is smooth and its support is a compact convex body in Rd then the mentioned inequalities from Definition 2.1 of [14] hold. Therefore, if p and q are smooth and their supports are compact convex bodies in Rd then
one can simplify conditions of Corollaries 1 and 3.
Now instead of (C1; f) we consider the following condition introduced in [7] that allows us to work with densities, whose supports need not be bounded.
(C2;f) For a fixed R>0, there exists a constant c>0 and a version of a density f such that
[TABLE]
Remark 5
If, for some positive ε, R and c, condition (C2;q) is true and
[TABLE]
then obviously Tp,q(ε,R)<∞.
Thus in Theorems 1 and 2 one can employ, for f=p and f=q, condition (C2;f) and suppose, for some ε>0, finiteness of ∫Rdq(x)−εp(x)dx and ∫Rdp1−ε(x)dx instead of the corresponding assumptions
Tp,q(ε,R)<∞ and Tp,p(ε,R)<∞. To illustrate this observation
we provide a result for a density with unbounded support.
Corollary 5
Let X, Y be Gaussian random vectors in Rd with EX=μX, EY=μY
and nondegenerate covariance matrices ΣX and ΣY, respectively.
Then relations (2.8) and (2.10) hold where
[TABLE]
The latter formula can be found, e.g., in [25], p. 147. The proof of Corollary 5 is discussed in Appendix.
Similarly to condition (C2;f) let us consider the following one.
(B2;f) For a fixed R>0, there exists a constant C>0 and a version of a density f such that
[TABLE]
Remark 6
If, for some positive ε, R and c, condition (B2;q) is true and
[TABLE]
then obviously Qp,q(ε,R)<∞.
Thus in Theorems 1 and 2 one can employ, for f=p and f=q, condition (B2;f) and suppose that ∫Rdq(x)εp(x)dx and ∫Rdp1+ε(x)dx are finite (for some ε>0) instead of the assumptions
Qp,q(ε,R)<∞ and Qp,p(ε,R)<∞.
For a fixed k∈{1,…,n−1}, consider the Kozachenko - Leonenko estimate of the Shannon differential entropy H(X) of a vector X with values in Rd having a density p w.r.t. the Lebesgue measure.
Namely, H(X):=−∫Rd(logp(x))p(x)μ(dx) and, for i.i.d. observations X1,X2,…, such that law(X1)=law(X), set for all n≥k+1,
[TABLE]
Similar to (1.4) one can employ the following generalization of statistics Hn(k):
[TABLE]
where Kn:={ki}i=1n, and, for some r∈N and all i∈N, ki≤r.
Corollary 6
Let Qp,p(ε,R)<∞ and Tp,p(ε,R)<∞ for some positive ε and R. Then the following statements hold for any fixed k∈N.
1) If, for some N∈N, Kp,p(1,N)<∞, then
EHn(k)→H(X),n→∞.
2) If, for some N∈N, Kp,p(2,N)<∞, then
E(Hn(k)−H(X))2→0,n→∞.
In particular, one can employ Lp,p(ν) with ν>1 instead of K(1,N), and with ν>2 instead of K(2,N), where N∈N.
The proof of the first statement of this corollary is contained in the proof of Theorem 1, Step 5. In a similar way one can infer the second statement of Corollary 6 by means of the proof of Theorem 2, Step 5.
For n,m∈N such that n>1, for fixed k∈N and m∈N, where 1≤k≤n−1, 1≤l≤m and
i=1,…,n, set
ϕm,l(i)=mVm,ld(i), ζn,k(i)=(n−1)Rn,kd(i).
Then we can rewrite the estimate Dn,m(k,l) as follows
[TABLE]
It is sufficient to prove the following two claims.
Statement 1. For each fixed l, all m large enough and any i∈N,
E∣logϕm,l(i)∣ is finite. Moreover,
[TABLE]
Statement 2. For each fixed k, all n large enough and any i∈N,
E∣logζn,k(i)∣ is finite. Moreover,
We are going to discuss in detail only the proof of Statement 1, since Statement 2 is established in a similar way.
It was explained in [7] that if V is a nonegative random variable (hence EV≤∞) and X is an arbitrary random vector with values in Rd then
[TABLE]
Formula (3.4) means that
simultaneously both sides are finite or infinite and coincide.
Let F(u,ω) be a regular conditional distribution function
of V given X where u∈[0,∞) and ω∈Ω.
Let h be a measurable function such that h:R→[0,∞). Then, for PX-almost all x∈Rd, it follows (without assumption Eh(V)<∞) that
[TABLE]
This means that both sides of (3.5) are finite or infinite simultaneously and coincide.
By virtue of (3.4) and (3.5) one can prove that E∣logϕm,l(i)∣<∞, for all m large enough,
fixed l and for all i∈N,
and (3.2) holds. For this purpose we take V=ϕm,l(i),
X=Xi and h(u)=∣logu∣, u>0 (we use h(u)=log2u in the proof of Theorem 2).
To reduce the volume of the paper we
only consider below the evaluation of Elogϕm,l(i) as all steps of the proof are the same when
treating E∣logϕm,l(i)∣.
We divide the proof of Statement 1 into four steps. Preliminary Steps 1-3 are devoted to the demonstration, for x∈A⊂S(p) and i∈N, of relation
[TABLE]
where A depends on p and q versions, PX(S(p)∖A)=0.
Then Step 4 justifies the desired result (3.2). Step 5 contains the validation of Statement 2.
Step 1. Here we establish the distribution convergence for the auxiliary random variables.
Fix any i∈N
and l∈{1,…,m}.
To simplify notation we do not indicate the dependence of functions on d.
For x∈Rd and u>0, we study the asymptotic behavior (as m→∞) of the following function
[TABLE]
where
[TABLE]
We have employed in (3.10) the independence of random vectors Y1,…,Ym,Xi and condition that Y1,…,Ym have the same law as Y.
We also took into account that an event {x−Y(l)(x,Ym)>rm(u)} is
a union of pair-wise disjoint events As, s=0,…,l−1. Here As means that
exactly s observations among Ym belong to the ball B(x,rm(u)) and other
m−s are outside this ball (probability that Y belongs to the sphere
{z∈Rd:∥z−x∥=r} equals [math] since Y has a density w.r.t. the Lebesgue measure μ).
Formulas (3.10) and (3.11) show that Fm,l,xi(u) is
the regular conditional distribution function of ϕm,l(i) given Xi=x. Moreover, (3.10) means that ϕm,l(i), i∈{1,…,n} are identically distributed and we may omit the dependence on i. So, one can replace Fm,l,xi(u) with Fm,l,x(u).
According to the Lebesgue differentiation theorem (see, e.g., [43], p. 654) if q∈L1(Rd) then, for μ-almost all x∈Rd,
the following relation holds
[TABLE]
Let Λ(q) stand for a set of all the Lebesgue points of a function q,
i.e. points x∈Rd satisfying (3.12). Clearly, Λ(q) depends on the chosen version of q belonging to the class of equivalent functions from L1(Rd) and, for an arbitrary version of q, we have μ(Rd∖Λ(q))=0.
Note that, for each u>0, rm(u)→0 as m→∞, and \mu(B(x,r_{m}(u)))=V_{d}{\big{(}r_{m}(u)\big{)}}^{d}=\frac{V_{d}u}{m}. Therefore by virtue of (3.12), for any fixed x∈Λ(q) and u>0,
[TABLE]
where αm(x,u)→0,m→∞.
Hence, for x∈Λ(q)∩S(q) (thus q(x)>0),
due to (3.10)
We assume without loss of generality (w.l.g.) that, for all x∈S(q), the random variables ξl,x and {ξm,l,x}m≥l
are defined on a probability space (Ω,F,P) since in view of the Lomnicki - Ulam theorem (see, e.g. [18], p. 93) one can consider
the independent copies of Y1,Y2,… and {ξl,x}x∈S(q) defined on a certain probability space.
The convergence in law of random variables is preserved under continuous mapping. Hence, for any
x∈Λ(q)∩S(q), we come to the relation
[TABLE]
We took into account that, for each x∈Λ(q)∩S(q), one has ξl,x>0 a.s. and
since Y has a density we infer that
P(ξm,l,x>0)=P(x−Y(l)(x,Ym)>0)=1.
More precisely, we can ignore zero values of nonnegative random variables (having zero values with probability zero) when we take their logarithms.
Step 2. Now we show that instead of (3.6) validity one can verify the following statement. For μ-almost every x∈Λ(q)∩S(q),
[TABLE]
Note that if η∼Γ(α,λ), where α>0 and λ>0, then
[TABLE]
Set α=Vdq(x), where q(x)>0 for x∈S(q), and λ=l. Then
Elogξl,x=ψ(l)−log(Vdq(x))=ψ(l)−logVd−logq(x).
By virtue of (3.5), for each x∈Rd,
[TABLE]
Thus, for x∈Λ(q)∩S(q), the relation E(logϕm,l(1))∣X1=x)→ψ(l)−logVd−logq(x) holds if and only if (3.17) is true.
According to Theorem 3.5 [4] we would have established (3.17) if relation (3.16) could be
supplemented, for μ-almost all x∈Λ(q)∩S(q), by the uniform integrability of a family {logξm,l,x}m≥m0(x).
Note that, for each N∈N, a function GN(t) introduced by (2.3) is increasing on (0,∞) and tGN(t)→∞, as t→∞. Therefore,
by the de la Valle Poussin theorem (see, e.g., Theorem 1.3.4 [6]),
to guarantee, for μ-almost every x∈Λ(q)∩S(q), the uniform integrability of {logξm,l,x}m≥m0(x)
it suffices to prove, for such x, a positive C0(x) and m0(x)∈N, that
Note that, for u∈(e[N1]1,e[N1]], we have
GN1(∣logu∣)=0.
Therefore, due to Lemma 2, for x∈Λ(q)∩S(q) and m≥l, we get
EGN1(∣logξm,l,x∣):=I1(m,x)+I2(m,x) where
[TABLE]
For convenience sake we write I1(m,x) and I2(m,x) without indicating their dependence on N1,l and d. Recall that N1 is fixed.
Part (3a). We provide bounds for I1(m,x).
Take R1>0 appearing in conditions of Theorem 1 and any u∈(0,e[N1]1]. Let us denote m1:=max{⌈e[N1]R1d1⌉,l}, where ⌈a⌉:=inf{m∈Z:m≥a}, a∈R. Then
rm(u)=(mu)1/d≤(e[N1]m1)1/d≤R1
if m≥m1. Note also that we can consider only m≥l everywhere below, because the size of sample Ym should not be less than number of the neighbors l (see, e.g., (3.10)). Thus, for R1>0, u∈(0,e[N1]1], x∈Rd and m≥m1,
[TABLE]
and we obtain an inequality
[TABLE]
If ε∈(0,1] and t∈[0,1] then, for all m≥1,
invoking the Bernoulli inequality, one has
[TABLE]
By assumptions of the Theorem Qp,q(ε1,R1)<∞ for some ε1>0, R1>0. According to Lemma 1 we can assume that ε1<1. Thus, due to (3.23) and since Wm,x(u)∈[0,1] for all x∈Rd, u>0 and m≥l, we get
[TABLE]
In view of (3.10), (3.22) and (3.24) one can claim now that, for all x∈Λ(q)∩S(q), u∈(0,e[N]1] and m≥m1,
[TABLE]
Therefore, for any x∈Λ(q)∩S(q) and m≥m1, one can write
[TABLE]
where U1(ε,N,d):=VdεLN(ε), LN(ε):=∫[e[N−1],∞)(log[N](t)+1)e−εtdt<∞ for each ε>0 and any N∈N. We took into account
that (−gN1(u))≤u1(log[N1](−logu)+1) if u∈(0,e[N1]1].
Part (3b). We give bounds for I2(m,x). Since gN1(u)≤ulog[N1+1](u)+1 if u∈(e[N1],∞), we can write, for m≥max{e[N1]2,l},
[TABLE]
Evidently,
[TABLE]
where Pm,x(u)=1−Wm,x(u) and Z∼Bin(m,Pm,x(u)).
By Markov’s inequality P(Z≥x)≤e−λxEeλZ
for any λ>0 and x>0.
One has
[TABLE]
Consequently, for each λ>0,
[TABLE]
To simplify bounds we take λ=1 and set S1=S1(l):=el−1, S2:=1−e1
(recall that l is fixed). Thus S1≥1 and S2<1. Therefore,
[TABLE]
where we have used an elementary inequality 1−t≤e−t, t∈[0,1].
For R2>0 appearing in conditions of the Theorem and any u∈(e[N],m], one can choose m2:=max{⌈R22d1⌉,⌈e[N1]2⌉,l} such that if m≥m2 then
rm(u)=(mu)1/d≤(m1)1/d≤R2.
Due to (3.11) and (3.39), for u∈(e[N1],m] and m≥m2, one has
[TABLE]
by definition of mf (for f=q) in (2.2).
Now we use the following Lemma 3.2 of [7].
Lemma 3
For a version of a density q and each R>0, one has μ(S(q)∖Dq(R))=0 where Dq(R):={x∈S(q):mq(x,R)>0} and mq(⋅,R) is defined according to (2.2).
It is easily seen that, for any t>0 and each δ∈(0,e], one has e−t≤t−δ.
Thus, for x∈Dq(R2), m≥m2, u∈(e[N],m] and ε2>0, we deduce from conditions of the Theorem (in view of Lemma 1 one can suppose that ε2∈(0,e]),
taking into account that mq(x,R2)>0 for x∈Dq(R2) and applying relation (3.42), that
[TABLE]
Thus, for all x∈Λ(q)∩S(q)∩Dq(R2) and any m≥m2,
[TABLE]
where U2(ε,N,d,l):=S1(l)LN(ε)(S2Vd)−ε.
Part (3c). Consider J2(m,x).
In view of (3.43), for all x∈Λ(q)∩S(q)∩Dq(R2) and any m≥m2, it holds 1−Fm,l,x(m)≤S1(S2Vdmq(x,R2)m)−ε2. Thus (as m2≥2)
[TABLE]
Then, for all x∈Λ(q)∩S(q)∩Dq(R2) and any m≥m2,
[TABLE]
where U3(m,ε2,N1,d,l):=23S1(l)(S2Vd)−ε2m−2ε2logm(log[N1](2logm)+1)→0, m→∞.
Part (3d). To get bounds for J3(m,x) we employ several auxiliary results.
Lemma 4
For each N∈N and any ν>0, there are a:=a(d,ν)≥0,b:=b(N,d,ν)≥0 such that, for arbitrary x,y∈Rd,
On the other hand, by (3.10), one has F_{1,1,x}(w)=1-\big{(}1-W_{1,x}(w)\big{)}=W_{1,x}(w). Consequently, for any m∈N, w≥0 and all x∈Rd,
[TABLE]
Moreover, F1,1,x(w)=P(∥Y−x∥d≤w). So, ξ1,1,x=law∥Y−x∥d. Thus, in view of Lemmas 2 and 4 (for N=N1 and ν=1)
[TABLE]
since GN(t)=0 for t∈[0,e[N−1]], N∈N.
Now we will estimate 1−Fm,l,x(u) in a way different from
(3.38).
Fix any δ>0. Note that, for all m≥(l−1)(1+δ1) and s∈{0,…,l−1}, it holds m−sm≤m−l+1m≤1+δ. Then, for all x∈Rd, u≥0 and m≥max{l,(l−1)(1+δ1)}, in view of (3.10) one can write
[TABLE]
[TABLE]
We are going to employ the following statement as well.
Lemma 5
For each N∈N, a function log[N](t), t>e[N−1],
is slowly varying at infinity.
Its proof is elementary and thus is omitted.
Part (3e). Now we are ready to get the bound for J3(m,x). Set u=mw. Then one has
[TABLE]
Inequality w>m and Lemma 5 imply log[N1+1](mw)≤log[N1+1](w2)=log[N1](2logw)≤2log[N1+1](w) for w large enough, namely for all w≥W, where W=W(N1).
Take δ>0 and set m3:=max{l,⌈(l−1)(1+δ1)⌉,⌈W(N1)⌉,⌈e[N1]⌉}. Let further m≥m3. Then
Let us note: 1) PX(S(p)∖Ap(GN1))=0 as we assumed that Kp,q(1,N1)<∞;
2) PX(S(p)∖S(q))=0 as PX≪PY;
3) \mu\big{(}S(q)\setminus(\Lambda(q)\cap D_{q}(R_{2}))\big{)}=0 due
to Lemma 3.
Since PX≪μ we conclude that {\sf P}_{X}\big{(}S(q)\setminus(\Lambda(q)\cap D_{q}(R_{2}))\big{)}=0. Hence, one has {\sf P}_{X}\big{(}S(p)\setminus(\Lambda(q)\cap D_{q}(R_{2}))\big{)}=0 in view of 2) and because B∖C⊂(B∖A)∪(A∖C) for any A,B,C⊂Rd. Set further A:=Λ(q)∩S(q)∩Dq(R2)∩S(p)∩Ap(GN1).
It follows from 1), 2) and 3) that PX(S(p)∖A)=0, so PX(A)=1. We are going to consider only x∈A.
Then, by virtue of (3.55) and (3.59), for all m≥m3 and x∈A, we come to the inequality
[TABLE]
where A(δ,d):=2(1+δ)a(d,1), B(δ,d,N1):=2(1+δ)b(N1,d,1).
Part (3f). Thus, for each x∈A and m≥max{m1,m2,m3}, taking into account (3.30), (3.46), (3.47) and (3.60) we can claim that
[TABLE]
Moreover, for any κ>0, one can take m4=m4(κ,ε2,N1,d,l)∈N such that U3(m,ε2,N1,d,l)≤κ for m≥m4. Then by virtue of (3.64), for each x∈A and m≥m0:=max{m1,m2,m3,m4},
[TABLE]
Hence, for each x∈A, the uniform integrability of the family {logξm,l,x}m≥m0 is established.
Step 4. Now we verify (2.8). We have already proved, for each x∈A (thus, for PX-almost every x belonging to S(p)) that E(logϕm,l(1)∣X1=x)→ψ(l)−logVd−logq(x), m→∞.
Set Zm,l(x):=E(logϕm,l(1)∣X1=x)=Elogξm,l,x.
Consider x∈A and take any m≥max{m1,m2,m3,m4}. We use the following property of GN which is shown in Appendix.
Lemma 6
For each N∈N, a function GN is convex on R+.
Thus a function GN1 is nondecreasing and convex.
On account of the Jensen inequality
Formulas (3.71) and (3.72) show that Fn,k,x(u) is
the regular conditional distribution function of ζn,k(i) given Xi=x. Moreover, for any fixed u>0 and x∈Λ(p)∩S(p) (thus p(x)>0),
[TABLE]
Hence, ξn,k,x→lawξk,x, x∈Λ(p)∩S(p), n→∞. For N∈N, set Ap(GN):={x∈S(p):RN(x)<∞}, where
[TABLE]
Introduce A:=Λ(p)∩S(p)∩Dp(R4)∩Ap(GN2). Then P(A)=1 and, for x∈A, one can verify that
EGN2(∣logξn,k,x∣)≤C0(x)<∞ and therefore
Elogξn,k,x→Elogξk,x. Thus E(logζn,k(1)∣X1=x)→ψ(k)−logVd−logp(x), n→∞.
Set Zn,k(x):=E(logζn,k(1)∣X1=x).
One can see that, for all n≥n0, ∫RdGN2(∣Zn,k(x)∣)p(x)dx<∞.
Hence similar to Steps 1–4 we come to relation (3.3).
First of all note that, in view of
Lemma 1, the finiteness of Kp,q(2,N1) and Kp,p(2,N2) implies the finiteness of Kp,q(1,N1) and Kp,p(1,N2), respectively. Thus the conditions of Theorem 2 entail validity of Theorem 1 statements. Consequently under the conditions of Theorem 2, for n and m large enough, one can claim that Dn,m(k,l)∈L1(Ω) and EDn,m(k,l)→D(PX∣∣PY), as n,m→∞.
We will show that Dn,m(k,l)∈L2(Ω) for all n and m large enough. Then we can write
[TABLE]
Therefore to prove (2.10) we will demonstrate that
var(Dn,m(k,l))→0, n,m→∞.
Due to (3.10) the random variables
logϕm,l(1),…,logϕm,l(n) are identically distributed (and logζn,k(1), …,logζn,k(n) are identically distributed as well). Hence (3.1) yields
[TABLE]
We do not strictly adhere to notation used in
Theorem 1 proof. Namely, the choice of the sets A⊂Rd, A⊂Rd, positive Uj,Cj(x),Cj(x) and
integers mj,nj, where
j∈Z+ and x∈Rd, could be different.
The proof of Theorem 2 is also divided into several steps. Steps 1-3 are devoted to the demonstration of relation n1var(logϕm,l(1))→0 as n,m→∞, while Step 4 contains the proof of relation n22∑1≤i<j≤ncov(logϕm,l(i),logϕm,l(j))→0 as n,m→∞.
In Step 5 we establish that
[TABLE]
This step is rather involved.
In Step 6 we come to the desired statement var(Dn,m(k,l))→0, n,m→∞.
Step 1. We study Elog2(ϕm,l(1)), as m→∞. Consider
[TABLE]
where the first four sets appeared in Theorem 1 proof, and Ap,2(GN), for N∈N and a probability density p on Rd, is defined quite similar to Ap(GN). Namely, for x∈Rd and N∈N, introduce
[TABLE]
and set Ap,2(GN):={x∈S(p):RN,2(x)<∞}.
Then PX(S(p)∖Ap,2(GN1))=0
since Kp,q(2,N1)<∞. It is easily seen that PX(A)=1. The reasoning is the same as in the proof of Theorem 1.
Recall that, for each x∈A, one has logξm,l,x→lawlogξl,x,m→∞, where ξm,l,x:=mx−Y(l)(x,Ym)d and ξl,x has Γ(Vdq(x),l) distribution. Convergence in law of random variables is preserved under continuous mapping. Hence, for any
x∈A, we come to the relation
where h1:=h1(l,d) and h2:=h2(l,d) depends only on fixed l and d.
We prove now that, for x∈A, one has
[TABLE]
By virtue of (4.11) and (4.15)
relation (4.16) is equivalent to the following one Elog2ξm,l,x→Elog2ξl,x, m→∞.
So, in view of (4.8) to prove (4.16) it is sufficient to show that, for each x∈A, a family {log2ξm,l,x}m≥m0(x) is uniformly integrable for some m0(x)∈N. As in the proof of Theorem 1, we can verify that, for all x∈A and some nonnegative C0(x),
[TABLE]
Step 2. Now our goal is to prove (4.17).
For each N∈N, introduce ρ(N):=exp{e[N−1]} and
[TABLE]
As usual, a product over an empty set (if N=1) is equal to 1.
The proof of this lemma is omitted, being quite similar to one of Lemma 2. By Lemma 7 and since GN1(log2u)=0, for u∈(ρ(N1)1,ρ(N1)], one has
[TABLE]
To simplify notation we do not indicate the dependence of Ii(m,x) (i=1,2) on N1, l and d.
We divide further proof into several parts.
Part (2a). At first we consider I1(m,x).
As in Theorem 1 proof, for fixed R1>0 and ε1>0 appearing in the conditions of Theorem 2, an inequality
Fm,l,x(u)≤(Mq(x,R1))ε1Vdε1uε1
holds, for any x∈A, u∈(0,ρ(N1)1] and m≥m1:=max{⌈ρ(N1)R1d1⌉,l}.
Taking into account that 0≤(−hN1(u))≤u(−2logu)(log[N1](log2u)+1) if u∈(0,ρ(N1)1], we get, for m≥m1,
[TABLE]
Here U1(ε,N,d):=VdεLN,2(ε), LN,2(ε):=∫[e[N−1],∞)2t(log[N](t2)+1)e−εtdt<∞
for each ε>0 and any N∈N.
Part (2b). Consider I2(m,x). As in the proof of Theorem 1, taking into account that,
for u∈(ρ(N1),∞), hN1(u)≤u2logu(log[N1](log2u)+1), we write, for all m≥max{ρ2(N1),l},
[TABLE]
where we do not indicate the dependence of Jj(m,x) (j=1,2,3) on N1 and l.
For R2>0 and ε2>0 appearing in the conditions of Theorem 2, one can prove (see Theorem 1 proof), that inequality
[TABLE]
holds for any x∈A, u∈(ρ(N1),m] and all m≥m2:=max{⌈R22d1⌉,⌈ρ2(N1)⌉,l}. Here S1:=S1(l) and S2 are the same as in the proof of Theorem 1. For all x∈A and m≥m2, we come to the relations
[TABLE]
where U2(ε,N,d,l):=2S1(l)LN,2(ε)(S2Vd)−ε2.
Part (2c). Now we turn to J2(m,x).
Take δ>0.
Then, due to (4.21), for all x∈A and any
m≥m2,
[TABLE]
where U3(m,ε,N,d,l):=4S1(S2Vd)−ε2m−2ε2(log2m)(log[N1](4log2m)+1)→0, m→∞.
Part (2d). Now we consider J3(m,x). Take u=mw. Then J3(m,x) has the form
Pick some δ>0 and set m3:=max{l,⌈(l−1)(1+δ1)⌉,⌈T(N1)⌉,⌈ρ(N1)⌉}, where T(N) was introduced in (4.29).
Consider m≥m3.
In view of Lemma 4 (for N=N1 and ν=2), (3.57), (4.29), (2.7) and
since w>m,
[TABLE]
[TABLE]
RN,2(x) is defined in (4.7), A(δ,d):=4(1+δ)a(d,2),
B(\delta,d,N_{1}):=4(1+\delta)\big{(}a(d,2)G_{N_{1}}(e^{2}_{[N_{1}-1]})+b(N_{1},d,2)\big{)}.
Part (2e). Thus, for each x∈A and m≥max{m1,m2,m3}, taking into account (4.20), (4.24), (4.28) and (4.36), we can claim that
[TABLE]
Moreover, for any κ>0, one can choose m4:=m4(κ,ε2,N1,d,l)∈N such that, for m≥m4, it holds U3(m,ε2,N1,d,l)≤κ. Then by (4.40), for each x∈A and m≥m0:=max{m1,m2,m3,m4},
[TABLE]
Hence we have proved the uniform integrability of the family {log2ξm,l,x}m≥m0 for each x∈A. Therefore, for any x∈A (thus for PX-almost every x∈S(p)),
relation (4.16) holds.
Step 3. Now we can return to Elog2ϕm,l(1). Set Δm,l(x):=E(log2ϕm,l(1)∣X1=x)=Elog2ξm,l,x.
Consider x∈A and take any m≥m0. Function GN1 is nondecreasing and convex according to Lemma 6. Due to the Jensen inequality
[TABLE]
Relation (4.44) guarantees that, for each x∈A and all m≥m0,
[TABLE]
We have established uniform integrability of the family {Δm,l(⋅)}m≥m0
(w.r.t. measure PX). Therefore, we conclude that
[TABLE]
It is easily seen that finiteness of integrals Qp,q(ε1,R1), Tp,q(ε2,R2) implies that
[TABLE]
This is verified as in Remark 4 by taking into account that log2z≤ε24zε for all z≥1 and ε>0.
Thus, Elog2ϕm,l(1)→τ2<∞. Hence var(logϕm,l(1))=Elog2ϕm,l(1)−(Elogϕm,l(1))2→τ2−τ12<∞, m→∞, where τ1:=ψ(l)−logVd−∫Rdp(x)logq(x)dx according to (3.2). Consequently, n1var(logϕm,l(1))→0 as n,m→∞.
Step 4. Now we consider cov(logϕm,l(i),logϕm,l(j)) for i=j, where i,j∈{1,…,n}.
For x,y∈Rd, introduce conditional distribution function
[TABLE]
For x,y∈Rd, u,w≥0, i=j,
[TABLE]
Here rm(a)=(ma)d1 for all a≥0, as previously. One can write Φm,l,x,y(u,w) instead of Φm,l,x,yi,j(u,w), because the right-hand side of (4.50) does not depend on i and j.
Set A_{1}:=\big{\{}(x,y):x\in A,\,y\in A,\,x\neq y\big{\}} and A_{2}:=\big{\{}(x,y):x\in A,\,y\in A,\,x=y\big{\}}, where A is introduced in (4.6). Evidently, (PX⊗PX)(A1)=1 and (PX⊗PX)(A2)=0.
Consider (x,y)∈A1. Obviously, for any a>0,
rm(a)→0, as m→∞. For (x,y)∈A1
we take m5=m5(u,w,∥x−y∥):=⌈(∥x−y∥2)dmax{u,w}⌉. Then rm(u)<2∥x−y∥ and rm(w)<2∥x−y∥ for all m≥m5.
Thus B(x,rm(u))∩B(y,rm(w))=∅ if m≥m5. Consequently, for m\geq m_{6}(u,w,\left\lVert x-y\right\rVert):=\max\big{\{}m_{5},2(l-1)\big{\}},
[TABLE]
In view of (3.10), (4.50) and (4.53), one has
for Φm,l,x,y(u,w) the following representation
[TABLE]
For any fixed (x,y)∈A1 and u,w>0,
[TABLE]
Then, according to (4.56), (3.14) and (4.59), for all fixed u,w>0, (x,y)∈A1, one has
[TABLE]
Thus Φl,x,y(⋅,⋅) is a distribution function of a vector
ηl,x,y:=(ξl,x,ξl,y),
where ξl,x∼Γ(Vdq(x),l), ξl,y∼Γ(Vdq(y),l)
and the components of ηl,x,y are independent.
Observe also that Φm,l,x,y(⋅,⋅) is a distribution function of a random vector
ηm,l,x,y:=(ξm,l,x,ξm,l,y).
Consequently, we have shown that ηm,l,x,y→lawηl,x,y as m→∞.
Therefore, for any (x,y)∈A1,
[TABLE]
Here we exclude a set of zero probability where random variables under consideration can be equal to zero.
Note that, for all i,j∈N, i=j,
[TABLE]
Obviously, in view of (3.20) and since ξl,x and ξl,y are independent, one has
[TABLE]
Now we intend to verify that, for any (x,y)∈A1,
[TABLE]
Equivalently, one can prove that, for each (x,y)∈A1, E(logξm,l,xlogξm,l,y)→E(logξl,xlogξl,y), m→∞.
Part (4a). We establish the uniform integrability of a family {logξm,l,xlogξm,l,y}m≥m0 for (x,y)∈A1. The function GN1(⋅) is nondecreasing and convex. Thus,
for any (x,y)∈A1, following the proof of Step 2, one can find m0 (the same as in the proof of Step 2) such that, for all m≥m0,
[TABLE]
Clearly, U1,U2,κ,A,B do not depend on x or y by virtue of (4.43).
Hence, for any (x,y)∈A1, a family {logξm,l,xlogξm,l,y}m≥m0 is uniformly integrable. Therefore we come to (4.65) for (x,y)∈A1.
Part (4b). Set T_{m,l}(x,y):={\sf E}\big{(}\log\phi_{m,l}(1)\log\phi_{m,l}(2)|X_{1}=x,X_{2}=y\big{)}=E(logξm,l,xlogξm,l,y), where (x,y)∈A1.
Then (4.65) means that Tm,l(x,y)→(ψ(l)−logVd−logq(x))(ψ(l)−logVd−logq(y)) for any (x,y)∈A1, as m→∞. Note that
[TABLE]
Due to (4.69) and (4.72) one can conclude that, for all m≥m0, as (PX⊗PX)(A1)=1,
[TABLE]
Hence, for (x,y)∈A1, a family
\big{\{}T_{m,l}(x,y)\big{\}}_{m\geq m_{0}} is uniformly integrable w.r.t. PX⊗PX.
Consequently,
[TABLE]
Thus
[TABLE]
On the other hand, taking also into account (3.2), we come to the relation
Step 5. Now we consider cov(logζn,k(i),logζn,k(j)) for i=j, where i,j∈{1,…,n}.
Similar to Step 4, for x,y∈Rd and u,w>0, introduce a conditional distribution function
[TABLE]
where ηn,k,xy,i,j:=(n−1)x−X(k)(x,{Xs}s=i,j∪{y})d. We write further Φn,k,x,y(u,w), ηn,k,xy and ηn,k,yx instead of Φn,k,x,yi,j(u,w), ηn,k,xy,i,j, ηn,k,yx,i,j, respectively (since
X1,X2,… are i.i.d. random vectors). Moreover, Φn,k,x,y(u,w) is the distribution function of a random vector ηn,k,x,y:=(ηn,k,xy,ηn,k,yx) and the regular conditional distribution function of a random vector
(ζn,k(i),ζn,k(j)) given (Xi,Xj)=(x,y). One has
[TABLE]
Introduce
[TABLE]
where the first three sets appeared in Theorem 1 proof (Step 5), and Ap,2(GN), for N∈N and a probability density p on Rd, is defined in full similarity to Ap(GN). Namely, introduce
[TABLE]
and set Ap,2(GN):={x∈S(p):RN,2(x)<∞}.
Then PX(S(p)∖Ap,2(GN2))=0 since Kp,p(2,N2)<∞. It is easily seen that PX(A)=1.
Consider \widetilde{A}_{1}:=\big{\{}(x,y):x\in\widetilde{A},\,y\in\widetilde{A},\,x\neq y\big{\}} and \widetilde{A}_{2}:=\big{\{}(x,y):x\in\widetilde{A},\,y\in\widetilde{A},\,x=y\big{\}}. Evidently, (PX⊗PX)(A1)=1 and (PX⊗PX)(A2)=0.
For any a>0,
rm(a)→0, as m→∞. Hence, for (x,y)∈A1,
one can find n5=n5(u,w,∥x−y∥)=1+⌈(∥x−y∥2)dmax{u,w}⌉ such that rn−1(u)<2∥x−y∥, rn−1(w)<2∥x−y∥ if n≥n5.
Then B(x,rn−1(u))∩B(y,rn−1(w))=∅ if n≥n5(u,w,∥x−y∥). Thus, for n\geq\widetilde{n}_{6}:=\max\big{\{}\widetilde{n}_{5},2k\big{\}}, one has
[TABLE]
[TABLE]
[TABLE]
Therefore, for each fixed (x,y)∈A1, u,w>0, we get, as n→∞,
[TABLE]
Here Φk,x,y(⋅,⋅) is the distribution function of a vector
ηk,x,y:=(ξk,x,ξk,y),
where ξk,x∼Γ(Vdp(x),k), ξk,y∼Γ(Vdp(y),k)
and the components of ηk,x,y are independent.
Consequently, we have shown that ηn,k,x,y→lawηk,x,y as n→∞.
Therefore, for any (x,y)∈A1,
[TABLE]
Here we exclude a set of zero probability where random variables under consideration can be equal to zero.
In a similar way to (4.62), for i,j∈{1,…,n}, i=j, we write
[TABLE]
Since ξk,x and ξk,y are independent, formula (3.20) yields
[TABLE]
For any fixed M>0, consider \widetilde{A}_{1,M}:=\big{\{}(x,y)\in\widetilde{A}_{1}:\left\lVert x-y\right\rVert>M\big{\}}. Now our aim is to verify that, for each (x,y)∈A1,M,
[TABLE]
Equivalently, we can prove, for each (x,y)∈A1,M, that
[TABLE]
The idea that we consider only (x,y)∈A1,M is principle for the further proof.
Part (5a). We will establish the uniform integrability of a family {logηn,k,xylogηn,k,yx}n≥n0 for (x,y)∈A1,M and some n0∈N which does not depend on x,y, but can depend on M. Then, due to (4.84), the relation (4.91) would be valid for such (x,y) as well.
As we have seen, the function GN2(⋅) is nondecreasing and convex. Hence
[TABLE]
Let us consider, for instance, EGN2(log2ηn,k,xy). As at Step 2 we can write
[TABLE]
where
[TABLE]
As usual a sum over empty set is equal to [math] (for k=1).
If u∈(0,ρ(N2)1], where ρ(N):=exp{e[N−1]}
and n≥n1:=⌈ρ(N2)Md1⌉+1, then rn−1(u)≤M. Thus rn−1(u)<∥x−y∥. In view of (4.97), \widetilde{F}_{n,k,x}^{y}(u)=1-\sum_{s=0}^{k-1}\binom{n-2}{s}\big{(}V_{n-1,x}(u)\big{)}^{s}\\
(1-V_{n-1,x}(u))^{n-2-s}. Similarly to (3.27), one has
[TABLE]
for all (x,y)∈A1,M, u∈(0,ρ(N2)1], n≥max{n1(M),n2(R3)}, where \widetilde{n}_{2}(R_{3}):=\max\big{\{}\left\lceil\frac{1}{\rho(N_{2})R_{3}^{d}}\right\rceil+1,k+1\big{\}}. Consequently, I1(n,x,y)≤U1(ε3,N2,d)(Mp(x,R3))ε3 for all (x,y)∈A1,M and n≥max{n1(M),n2(R3)}.
Moreover, for all u>0, in view of (4.97) it holds
[TABLE]
The same reasoning as was used in Theorem 1 proof (Step 3, Part (3b)) leads to the inequalities
[TABLE]
for all n≥max{n3(R4),3}. Then similarly to (4.40), the relation
[TABLE]
is valid for all (x,y)∈A1,M and n≥n0(M):=max{n1,n2,n3,n4(κ),3}. Here U1,U2,κ,A,B do not depend on x or y.
Thus, in view of (4.93), one has
[TABLE]
Hence, for any (x,y)∈A1,M, a family {logηn,k,xylogηn,k,yx}n≥n0 is uniformly integrable. Thus we come to (4.90) for (x,y)∈A1,M.
Part (5b). Set \widetilde{T}_{n,k}(x,y):={\sf E}\big{(}\log\zeta_{n,k}(1)\log\zeta_{n,k}(2)|X_{1}=x,X_{2}=y\big{)}=Elogηn,k,xylogηn,k,yx for all (x,y)∈A1.
Relation (4.90) validity is equivalent to the following one: for any (x,y)∈A1,M, Tn,k(x,y)→(ψ(k)−logVd−logp(x))(ψ(k)−logVd−logp(y)), as n→∞.
Now take any (x,y)∈A1. Then, for any fixed M>0 and (x,y)∈A1, we have proved that
[TABLE]
Note that
[TABLE]
Due to (4.105) and (4.111) one can conclude that, for all n≥n0,
[TABLE]
Hence, for (x,y)∈A1, a family
\big{\{}\widetilde{T}_{n,k}(x,y){\mathbb{I}}\{\left\lVert x-y\right\rVert>M\}\big{\}}_{n\geq\widetilde{n}_{0}} is uniformly integrable w.r.t. PX⊗PX.
Consequently, in view of (4.90), for each M>0,
[TABLE]
Now we consider the case ∥x−y∥≤M.
One has ⋂s=1∞{∥X1−X2∥≤s1}={X1=X2} and P(X1=X2)=0 as X1 and X2 are independent and have a density p(x) w.r.t. the Lebesgue measure μ. Then
[TABLE]
Taking into account that, for an integrable function h, ∫ChdP→0 as P(C)→0,
we get
[TABLE]
since Elogζn,k(1)logζn,k(2)≤21(Elog2ζn,k(1)+Elog2ζn,k(2))<∞ (the proof is similar to the establishing that Elogϕm,l(1)<∞).
Hence, for any γ>0, one can find M1=M1(γ)>0 such that, for all M∈(0,M1] and n≥n0,
[TABLE]
Set v(t):=ψ(k)−logVd−logp(t), t∈Rd.
Also
there exists M2=M2(γ)>0 such that, for all M∈(0,M2],
[TABLE]
Take M=min{M1,M2}. Due to (4.114) one can find n7(M,γ) such that for all n≥max{n0,n7(M,γ)} the following inequality holds
[TABLE]
So, for any γ>0, there is M(γ)>0 such that, for all n≥max{n0,n7(M,γ)}, one has
[TABLE]
By virtue of the formula
[TABLE]
and taking into account (4.116) we come to the relation
[TABLE]
Moreover, in view of (3.3) (see Step 5 of Theorem 1 proof), we have
[TABLE]
Therefore
[TABLE]
Step 6. Reasoning as at Steps 1-3 shows that n1var(logζn,k(1))→0, n→∞.
To prove that
[TABLE]
we write, for i,j=1,…,n, u,w>0, x,y∈Rd,x=y, ∥x−y∥>rn−1(w) (thus n>∥x−y∥dw+1) and m∈N,
[TABLE]
[TABLE]
Further we combine the estimates obtained at Steps 4 and 5 of Theorem 2 proof. Note that now we consider
(x,y)∈A1∩A1 and
employ Gmax{N1,N2}(⋅).
Thus we have established that {\sf var}\big{(}\widehat{D}_{n,m}(k,l)\big{)}\to 0 as n,m→∞, hence (2.10) holds.
The proof is complete. □
Appendix A Proofs of auxiliary results
Proofs of Lemmas 1, 2 and 3 are similar to the proofs of Lemma 2.5 and 3.1, 3.2 in [7]. We provide them for the sake of completeness.
Note that log∥x−y∥>e[N−1]≥1 if ∥x−y∥>e[N] and N∈N.
Hence, for such x,y, one has (log∥x−y∥)ν≤(log∥x−y∥)ν0 if ν∈(0,ν0]. If N≥N0 then
GN(u)≤GN0(u) for u≥e[N−1]≥e[N0−1]. Thus Kp,q(ν,N)≤Kp,q(ν0,N0)<∞ for ν∈(0,ν0] and any integer N≥N0.
Assume that Qp,q(ε1,R1)<∞. Consider Qp,q(ε1,R) where R>0.
If 0<R≤R1 then, for each x∈Rd, according to the definition of Mq one has
Mq(x,R)≤Mq(x,R1). Consequently, Qp,q(ε1,R)≤Qp,q(ε1,R1)<∞.
Let now R>R1. One has
[TABLE]
[TABLE]
Therefore
[TABLE]
Suppose now that Qp,q(ε1,R)<∞ for some ε1>0 and R>0. Then, for
any ε∈(0,ε1],
the Lyapunov inequality yields Qp,q(ε,R)≤(Qp,q(ε1,R))ε1ε<∞.
Let Tp,q(ε2,R2)<∞. Take 0<R≤R2. Then, for each x∈Rd, according to the definition of mq we get
0≤mq(x,R2)≤mq(x,R). Hence Tp,q(ε2,R)≤Tp,q(ε2,R2)<∞.
Consider R>R2. For each x∈Rd and every a>0, the function
Iq(x,r)
is continuous in r on (0,a].
Consider an arbitrary (fixed) x∈S(q)∩Λ(q). Then
there exists limr→0+Iq(x,r)=q(x). For such x, set Iq(x,0):=q(x). Thus Iq(x,⋅) is continuous on any segment [0,a]. Hence, one can find R2 in [0,R2] such that
mq(x,R2)=Iq(x,R2) and there exists R0 in [0,R] such that
mq(x,R)=Iq(x,R0). If R0≤R2 then mq(x,R)=mq(x,R2) (since mq(x,R)≤mq(x,R2) for R>R2 and mq(x,R)=Iq(x,R0)≥mq(x,R2) as R0∈[0,R2]).
Assume that
R0∈(R2,R]. Obviously R0>0 as R2>0. One has
[TABLE]
Thus in all cases (R0∈[0,R2] and R0∈(R2,R]) one has mq(x,R)≥(RR2)dmq(x,R2) as R2<R. Taking into account
the relation
μ(S(q)∖(S(q)∩Λ(q)))=0 we come to the inequality
[TABLE]
Assume now that Tp,q(ε2,R)<∞ for some ε2>0 and R>0. Then, for
any ε∈(0,ε2],
the Lyapunov inequality yields Tp,q(ε,R)≤(Tp,q(ε2,R))ε2ε<∞. The proof is complete. □
Proof of Lemma 2. We start with relation 1).
Note that if a function g is measurable and bounded on a finite interval (a,b] and ν is a finite measure on the Borel subsets of (a,b] then
∫(a,b]g(x)ν(dx) is finite. Thus, for each a∈(0,e[N]1], using the integration by parts formula (see, e.g., [36], p. 245) we get
[TABLE]
Assume now that
∫(0,e[N]1]GN(−logu)dF(u)<∞.
Then by the monotone convergence theorem
[TABLE]
Clearly, the following nonnegative integral admits an estimate
Letting a→0+ in (A.3) we come, by the monotone convergence theorem, to relation 1) of
our Lemma.
Suppose now that
[TABLE]
In view of (A.6) and the equality ∫(0,e[N]1]F(u)(−gN(u))du=∫(0,e[N]1]F(u)d(−GN(−logu)) by monotone convergence theorem we have
limb→0+∫(0,b]F(u)d(−GN(−logu))=0.
For any c∈(0,b), we obtain the inequalities
[TABLE]
[TABLE]
[TABLE]
[TABLE]
Let c=b2 (b≤e[N]1<1). Then, for all positive b small enough,
[TABLE]
Thus ∫(0,b]F(u)d(−GN(−logu))≥21F(b2)GN(−log(b2))≥0.
It follows that
F(b2)GN(−logb2)→0 as b→0. Hence we come to (A.5) taking a=b2.
Then (A.3) yields relation 1).
If one of (nonnegative) integrals appearing in 1) is infinite and other one is finite
we come to the contradiction. Hence 1) is established.
In a similar way one can prove that relation 2) is valid. Therefore, we omit further details.
□
Proof of Lemma 3.
Take x∈S(q)∩Λ(q) and R>0. Assume that mq(x,R)=0.
Since the function Iq(x,r) defined in (2.1) is continuous in (x,r)∈Rd×(0,∞), there exists R∈[0,R]
(R=R(x,R)) such that mq(x,R)=Iq(x,R) (recall that Iq(x,0):=limr→0+Iq(x,r)=q(x) for all x∈Λ(q) by continuity).
If R=0 then mq(x,r)=q(x)>0 as x∈S(q)∩Λ(q).
Hence we have to consider R∈(0,R].
If Iq(x,R)=0 then ∫B(x,r)q(y)dy=0 for any 0<r≤R.
Thus (3.12) ensures that q(x)=0. However, x∈S(q)∩Λ(q).
So mq(x,R)>0 for x∈S(q)∩Λ(q). Thus, S(q)∩Λ(q)⊂Dq(R):={x∈S(q):mq(x,R)>0}. It remains to note
that S(q)∖Λ(q)⊂Rd∖Λ(q) and μ(Rd∖Λ(q))=0. Therefore μ(S(q)∖Dq(R))=0. □
Proof of Lemma 4.
We verify that, for given N∈N and τ>0, there exist a:=a(τ)≥0 and b:=b(N,τ)≥0 such that, for any c≥0,
[TABLE]
For c=0 the statement is obviously true. Let c>0.
One can easily see that log[N](c)log[N](τc)→1 as c→∞.
Hence one can find c0(N,τ) such that, for all c≥c0(N,τ), the inequality log[N](c)log[N](τc)≤2 is valid. Consequently, for c≥c0(N,τ),
[TABLE]
For all 0≤c≤c0(N,τ) we write GN(τc)≤GN(τc0(N,τ)):=b(N,τ).
Therefore, for any c≥0, we come to (A.7).
Thus, for any ν>0 and x,y∈Rd, x=y, one has
[TABLE]
Proof of Lemma 6.
For t∈[0,e[N−1]], a function GN(t)≡0 is convex.
We show that GN is convex on (e[N−1],∞).
Consider t>e[N−1]. Write ∅∏:=1 and ∅∑:=0. Then, for N∈N,
[TABLE]
Obviously, (log[k](t)1)′=−tlog[k]2(t)1∏s=1k−1log[s](t)1, k∈N. Thus, for t>e[N−1], we get
[TABLE]
For N=1 and t>0, we have (G1(t))′′=t1>0. Take now N>1.
Clearly, for t>e[N−1], one has t1j=1∏N−1log[j](t)1>0 because log[j](t)>log[j](e[N−1])=e[N−1−j]≥1>0 when 1≤j≤N−1.
Observe also that
[TABLE]
The last inequality is established by induction in N.
Thus, in view of (A.8), we have proved that, for all t>e[N−1] and N∈N, the inequality (GN(t))′′>0 holds. Hence, the function GN(t) is (strictly) convex on (e[N−1],∞).
Let h:[a,∞)→R be a continuous nondecreasing function. If the restrictions of h to [a,b] and (b,∞) (where
a<b) are convex functions then, in general, it is not true that h is convex on [a,∞).
However, we can show that GN is convex on [0,∞). Note that a function GN is convex on
[e[N−1],∞) since it is convex on (e[N−1],∞) and continuous on [e[N−1],∞). Take now any z∈[0,e[N−1]], y∈(e[N−1],∞) and s∈[0,1]. Then GN(sz+(1−s)y)≤GN(se[N−1]+(1−s)y)≤sGN(e[N−1])+(1−s)GN(y)=(1−s)GN(y)=sGN(z)+(1−s)GN(y) as GN(z)=0.
Thus, for each N∈N, a function GN(⋅) is convex on R+. □
Proof of Corollary 5.
The proof (i.e. checking the conditions of both Theorem 1 and 2) is quite similar to the proof of Corollary 2.11 in [7].
Acknowledgements
The authors are grateful to Professor A.Tsybakov for useful discussions.
This work is supported by the Lomonosov Moscow State University under grant “Modern Problems of the Fundamental Mathematics and Mechanics”.
Bibliography45
The reference list from the paper itself. Each links out to its DOI / PubMed record.
1[1] Alonso-Ruiz, P., Spodarev, E. (2016). Entropy-based inhomogeneity detection in fiber materials. Methodol. Comput. Appl. Probab. Published online: 27 November 2017, doi.org/10.1007/s 11009-017-9603-2.
2[2] Berrett, T.B., Samworth R.J. and Yuan M. (2019). Efficient multivariate entropy estimation via k-nearest neighbour distances. Ann. of Statist . 47 , 288–318.
3[3] Biau G. and Devroye L. (2015). Lectures on the Nearest Neighbor Method . Springer, Cham.
4[4] Billingsley, P. (1999). Convergence of Probability Measures , 2nd edn. John Wiley, New York.
6[6] Borkar, V.S. (1995). Probability Theory. An Advanced Course . Springer, New York.
7[7] Bulinski, A., Dimitrov, D. (2019). Statistical estimation of the Shannon entropy. Acta Mathematica Sinica. English series . 35 , 17–46.
8[8] Bulinski, A. and Kozhevin, A. (2018). Statistical estimation of conditional Shannon entropy. ESAIM: Probability and Statistics . Published online: November 28, 1–35.