Statistical estimation of the Kullback-Leibler divergence

Alexander Bulinski; Denis Dimitrov

arXiv:1907.00196·math.ST·July 2, 2019

Statistical estimation of the Kullback-Leibler divergence

Alexander Bulinski, Denis Dimitrov

PDF

TL;DR

This paper develops conditions under which estimators of the Kullback-Leibler divergence, based on k-nearest neighbor statistics, are asymptotically unbiased and consistent for probability measures in R^d, including Gaussian measures.

Contribution

It introduces new asymptotic unbiasedness and consistency results for Kullback-Leibler divergence estimators using k-nearest neighbor methods, applicable to Gaussian measures.

Findings

01

Estimates are asymptotically unbiased under wide conditions.

02

Estimates are L^2-consistent for a broad class of probability measures.

03

New results on Kozachenko-Leonenko entropy estimators are derived.

Abstract

Wide conditions are provided to guarantee asymptotic unbiasedness and L^2-consistency of the introduced estimates of the Kullback-Leibler divergence for probability measures in R^d having densities w.r.t. the Lebesgue measure. These estimates are constructed by means of two independent collections of i.i.d. observations and involve the specified k-nearest neighbor statistics. In particular, the established results are valid for estimates of the Kullback-Leibler divergence between any two Gaussian measures in R^d with nondegenerate covariance matrices. As a byproduct we obtain new statements concerning the Kozachenko-Leonenko estimators of the Shannon differential entropy.

Equations389

D (P ∣∣ Q) := S \int lo g (\frac{d P}{d Q}) d P \mbox i f P ≪ Q,

D (P ∣∣ Q) := S \int lo g (\frac{d P}{d Q}) d P \mbox i f P ≪ Q,

D (P ∣∣ Q) = R^{d} \int p (x) lo g (\frac{p ( x )}{q ( x )}) d x \mbox f or P ≪ Q,

D (P ∣∣ Q) = R^{d} \int p (x) lo g (\frac{p ( x )}{q ( x )}) d x \mbox f or P ≪ Q,

R_{n, k} (i) := ∥ X_{i} - X_{(k)} (X_{i}, X_{n} ∖ {X_{i}}) ∥, V_{m, l} (i) := ∥ X_{i} - Y_{(l)} (X_{i}, Y_{m}) ∥, i = 1, \dots, n .

R_{n, k} (i) := ∥ X_{i} - X_{(k)} (X_{i}, X_{n} ∖ {X_{i}}) ∥, V_{m, l} (i) := ∥ X_{i} - Y_{(l)} (X_{i}, Y_{m}) ∥, i = 1, \dots, n .

D_{n, m} (k, l) := ψ (k) - ψ (l) + \frac{1}{n} i = 1 \sum n lo g (\frac{m V _{m, l}^{d} ( i )}{( n - 1 ) R _{n, k}^{d} ( i )}) .

D_{n, m} (k, l) := ψ (k) - ψ (l) + \frac{1}{n} i = 1 \sum n lo g (\frac{m V _{m, l}^{d} ( i )}{( n - 1 ) R _{n, k}^{d} ( i )}) .

D_{n, m} (k) = \frac{d}{n} i = 1 \sum n lo g (\frac{V _{m, l} ( i )}{R _{n, k} ( i )}) + lo g (\frac{m}{n - 1}),

D_{n, m} (k) = \frac{d}{n} i = 1 \sum n lo g (\frac{V _{m, l} ( i )}{R _{n, k} ( i )}) + lo g (\frac{m}{n - 1}),

D_{n, m} (K_{n}, L_{n}) := \frac{1}{n} i = 1 \sum n (ψ (k_{i}) - ψ (l_{i})) + lo g (\frac{m}{n - 1}) + \frac{d}{n} i = 1 \sum n lo g (\frac{V _{m, l_{i}} ( i )}{R _{n, k_{i}} ( i )}),

D_{n, m} (K_{n}, L_{n}) := \frac{1}{n} i = 1 \sum n (ψ (k_{i}) - ψ (l_{i})) + lo g (\frac{m}{n - 1}) + \frac{d}{n} i = 1 \sum n lo g (\frac{V _{m, l_{i}} ( i )}{R _{n, k_{i}} ( i )}),

I_{f} (x, r) := \frac{\int _{B (x, r)} f ( y ) d y}{r ^{d} V _{d}},

I_{f} (x, r) := \frac{\int _{B (x, r)} f ( y ) d y}{r ^{d} V _{d}},

M_{f} (x, R) := r \in (0, R] sup I_{f} (x, r), m_{f} (x, R) := r \in (0, R] in f I_{f} (x, r),

M_{f} (x, R) := r \in (0, R] sup I_{f} (x, r), m_{f} (x, R) := r \in (0, R] in f I_{f} (x, r),

G_{N} (t) := {0, t lo g_{[N]} (t), t \in [0, e_{[N - 1]}], t \in (e_{[N - 1]}, \infty) .

G_{N} (t) := {0, t lo g_{[N]} (t), t \in [0, e_{[N - 1]}], t \in (e_{[N - 1]}, \infty) .

K_{p,q}(\nu,N,t):=\;\;\;\;\;\;\;\int\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\int\limits_{x,y\in\mathbb{R}^{d}\!,\,\|x-y\|>t}{G_{N}\big{(}|\log\left\lVert x-y\right\rVert|^{\nu}\big{)}}p(x)q(y)\,dx\,dy,

K_{p,q}(\nu,N,t):=\;\;\;\;\;\;\;\int\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\int\limits_{x,y\in\mathbb{R}^{d}\!,\,\|x-y\|>t}{G_{N}\big{(}|\log\left\lVert x-y\right\rVert|^{\nu}\big{)}}p(x)q(y)\,dx\,dy,

Q_{p, q} (ε, R) := \int_{R^{d}} M_{q}^{ε} (x, R) p (x) d x,

Q_{p, q} (ε, R) := \int_{R^{d}} M_{q}^{ε} (x, R) p (x) d x,

T_{p, q} (ε, R) := \int_{R^{d}} m_{q}^{- ε} (x, R) p (x) d x .

T_{p, q} (ε, R) := \int_{R^{d}} m_{q}^{- ε} (x, R) p (x) d x .

K_{p, q} (ν, N, u) \leq K_{p, q} (ν, N, t) \leq K_{p, q} (ν, N, u) + max {G_{N} (∣ lo g t ∣^{ν}), G_{N} (∣ lo g u ∣^{ν})} .

K_{p, q} (ν, N, u) \leq K_{p, q} (ν, N, t) \leq K_{p, q} (ν, N, u) + max {G_{N} (∣ lo g t ∣^{ν}), G_{N} (∣ lo g u ∣^{ν})} .

n, m \to \infty lim E D_{n, m} (k, l) = D (P_{X} ∣∣ P_{Y}) .

n, m \to \infty lim E D_{n, m} (k, l) = D (P_{X} ∣∣ P_{Y}) .

\int_{R^{d}} p (x) ∣ lo g q (x) ∣ d x = \int_{q (x) \geq 1} p (x) lo g q (x) d x + \int_{q (x) < 1} p (x) lo g \frac{1}{q ( x )} d x

\int_{R^{d}} p (x) ∣ lo g q (x) ∣ d x = \int_{q (x) \geq 1} p (x) lo g q (x) d x + \int_{q (x) < 1} p (x) lo g \frac{1}{q ( x )} d x

\leq \frac{1}{ε _{1}} Q_{p, q} (ε_{1}, R_{1}) + \frac{1}{ε _{2}} T_{p, q} (ε_{2}, R_{2}) < \infty.

L_{p, q} (ν) := \int_{R^{d}} \int_{R^{d}} ∣ lo g ∥ x - y ∥ ∣^{ν} p (x) q (y) d x d y < \infty.

L_{p, q} (ν) := \int_{R^{d}} \int_{R^{d}} ∣ lo g ∥ x - y ∥ ∣^{ν} p (x) q (y) d x d y < \infty.

f (x) \leq M (f), x \in R^{d} .

f (x) \leq M (f), x \in R^{d} .

f (x) \geq m (f), x \in S (f) .

f (x) \geq m (f), x \in S (f) .

n, m \to \infty lim E (D_{n, m} (k, l) - D (P_{X} ∣∣ P_{Y}))^{2} = 0.

n, m \to \infty lim E (D_{n, m} (k, l) - D (P_{X} ∣∣ P_{Y}))^{2} = 0.

m_{f} (x, R) \geq c f (x), x \in R^{d} .

m_{f} (x, R) \geq c f (x), x \in R^{d} .

\int_{R^{d}} q (x)^{- ε} p (x) d x < \infty,

\int_{R^{d}} q (x)^{- ε} p (x) d x < \infty,

D (P_{X} ∣∣ P_{Y}) = \frac{1}{2} (tr (Σ_{Y}^{- 1} Σ_{X}) + (μ_{Y} - μ_{X})^{T} Σ_{Y}^{- 1} (μ_{Y} - μ_{X}) - d + lo g (\frac{det Σ _{Y}}{det Σ _{X}})) .

D (P_{X} ∣∣ P_{Y}) = \frac{1}{2} (tr (Σ_{Y}^{- 1} Σ_{X}) + (μ_{Y} - μ_{X})^{T} Σ_{Y}^{- 1} (μ_{Y} - μ_{X}) - d + lo g (\frac{det Σ _{Y}}{det Σ _{X}})) .

M_{f} (x, R) \leq C f (x), x \in S (f) .

M_{f} (x, R) \leq C f (x), x \in S (f) .

\int_{R^{d}} q (x)^{ε} p (x) d x < \infty

\int_{R^{d}} q (x)^{ε} p (x) d x < \infty

H_{n} (k) := \frac{1}{n} i = 1 \sum n lo g (\frac{R _{n, k}^{d} ( i ) V _{d} ( n - 1 )}{e ^{ψ (k)}}) .

H_{n} (k) := \frac{1}{n} i = 1 \sum n lo g (\frac{R _{n, k}^{d} ( i ) V _{d} ( n - 1 )}{e ^{ψ (k)}}) .

H_{n} (K_{n}) := - \frac{1}{n} i = 1 \sum n ψ (k_{i}) + lo g V_{d} + lo g (n - 1) + \frac{d}{n} i = 1 \sum n lo g R_{n, k_{i}} (i),

H_{n} (K_{n}) := - \frac{1}{n} i = 1 \sum n ψ (k_{i}) + lo g V_{d} + lo g (n - 1) + \frac{d}{n} i = 1 \sum n lo g R_{n, k_{i}} (i),

\widehat{D}_{n,m}(k,l)=\psi(k)-\psi(l)+\frac{1}{n}\sum_{i=1}^{n}\big{(}\log\phi_{m,l}(i)-\log\zeta_{n,k}(i)\big{)}.

\widehat{D}_{n,m}(k,l)=\psi(k)-\psi(l)+\frac{1}{n}\sum_{i=1}^{n}\big{(}\log\phi_{m,l}(i)-\log\zeta_{n,k}(i)\big{)}.

\frac{1}{n} i = 1 \sum n lo g ϕ_{m, l} (i) = E lo g ϕ_{m, l} (1) \to ψ (l) - lo g V_{d} - \int_{R^{d}} p (x) lo g q (x) d x, m \to \infty.

\frac{1}{n} i = 1 \sum n lo g ϕ_{m, l} (i) = E lo g ϕ_{m, l} (1) \to ψ (l) - lo g V_{d} - \int_{R^{d}} p (x) lo g q (x) d x, m \to \infty.

\frac{1}{n} i = 1 \sum n lo g ζ_{m, l} (i) = E lo g ζ_{n, k} (1) \to ψ (k) - lo g V_{d} - \int_{R^{d}} p (x) lo g p (x) d x, n \to \infty.

\frac{1}{n} i = 1 \sum n lo g ζ_{m, l} (i) = E lo g ζ_{n, k} (1) \to ψ (k) - lo g V_{d} - \int_{R^{d}} p (x) lo g p (x) d x, n \to \infty.

E D_{n, m} (k, l) \to - \int_{R^{d}} p (x) lo g q (x) d x + \int_{R^{d}} p (x) lo g p (x) d x = D (P_{X} ∣∣ P_{Y}), n, m \to \infty.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Statistical estimation of the Kullback-Leibler

**divergence **

Alexander Bulinski111E-mail: [email protected], Denis Dimitrov222E-mail: [email protected]

*Dept. of Mathematics and Mechanics, Lomonosov Moscow State University,

Moscow 119234, Russia*

Abstract Wide conditions are provided to guarantee asymptotic unbiasedness and $L^{2}$ -consistency of the introduced estimates of the Kullback - Leibler divergence for probability measures in $\mathbb{R}^{d}$ having densities w.r.t. the Lebesgue measure. These estimates are constructed by means of two independent collections of i.i.d. observations and involve the specified $k$ -nearest neighbor statistics. In particular, the established results are valid for estimates of the Kullback - Leibler divergence between any two Gaussian measures in $\mathbb{R}^{d}$ with nondegenerate covariance matrices. As a byproduct we obtain new statements concerning the Kozachenko-Leonenko estimators of the Shannon differential entropy.

Key words Kullback - Leibler divergence; Shannon differential entropy; statistical estimators; asymptotic behavior; Gaussian model. AMS (2010) Subject Classification 60F25, 62G20, 62H12

1 Introduction

The Kullback - Leibler divergence plays important role in various domains such as statistical inference (see, e.g., [25], [28]), machine learning ([5], [32]), computer vision ([11], [13]), network security ([23], [44]), feature selection and classification ([22], [29], [41]), physics ([17]), biology ([9]), finance ([45]), among others. Recall that this divergence measure between probabilities $\mathbb{P}$ and $\mathbb{Q}$ on a space $(S,\mathcal{B})$ is defined by way of

[TABLE]

where $\frac{d\mathbb{P}}{d\mathbb{Q}}$ stands for the Radon-Nikodym derivative. Otherwise, $D(\mathbb{P}||\mathbb{Q}):=+\infty$ . We employ the base $e$ of logarithms (a constant factor is not essential here). It is worth to emphasize that mutual information, widely used in many research directions, is a special case of the Kullback -Leibler divergence for certain measures. For comparison of various $f$ -divergence measures see [34].

If $(S,\mathcal{B})=(\mathbb{R}^{d},\mathcal{B}(\mathbb{R}^{d}))$ and (absolutely continuous) $\mathbb{P}$ and $\mathbb{Q}$ have densities, $p(x)$ and $q(x)$ , $x\in\mathbb{R}^{d}$ , w.r.t. the Lebesgue measure $\mu$ , then (1.1) can be rewritten as

[TABLE]

otherwise, $D({\mathbb{P}}||{\mathbb{Q}})=+\infty$ . To simplify notation we write $dx$ instead of $\mu(dx)$ . We formally set $0/0:=0$ , $0\cdot\log 0:=0$ . For a (version of) probability density $f$ denote by $S(f):=\{x\in\mathbb{R}^{d}:f(x)>0\}$ its support. Clearly, the integral in (1.2) is taken over $S(p)$ . Observe that when $\mathbb{P}\ll\mu$ and $\mathbb{Q}\ll\mu$ then $\mathbb{P}\ll\mathbb{Q}$ if and only if $\mathbb{P}(S(p)\setminus S(q))=0$ . Formula (1.2) is closely related to cross-entropy and the Shannon differential entropy.

Usually one has to reconstruct the measures (describing a stochastic model under consideration) or their characteristics using some collections of observations. In the pioneering paper [19] the estimator of the Shannon differential entropy was proposed, based on the nearest neighbor statistics. In a series of papers this estimate was studied and applied. Moreover, estimators of the Rényi entropy, mutual information and the Kullback - Leibler divergence have appeared (see, e.g., [20], [21], [42]). However, the authors of [27] indicated the occurrence of gaps in the known proofs concerning the limit behavior of such statistics. This issue has attracted our attention and motivated our study of the declared asymptotic properties. Thus in a recent work [7] the new functionals were introduced to prove asymptotic unbiasedness and $L^{2}$ -consistency of the Kozachenko - Leonenko estimators of the Shannon differential entropy. The present paper is aimed at extension of our approach to grasp the Kullback - Leibler divergence estimation. Instead of the nearest neighbor statistics we employ the $k$ -nearest neighbor statistics (on order statistics see, e.g., [3]) and also use more general forms of the mentioned functionals.

Let $X$ and $Y$ be random vectors taking values in $\mathbb{R}^{d}$ and having distributions ${\sf P}_{X}$ and ${\sf P}_{Y}$ , respectively (further we consider $\mathbb{P}={\sf P}_{X}$ and $\mathbb{Q}={\sf P}_{Y}$ ). Consider i.i.d. random vectors $X_{1},X_{2},\ldots,$ and i.i.d. random vectors $Y_{1},Y_{2},\ldots,$ with $law(X_{1})=law(X)$ and $law(Y_{1})=law(Y)$ . Assume that $\{X_{i},Y_{i},i\in\mathbb{N}\}$ are independent. We are interested in statistical estimation of $D({\sf P}_{X}||{\sf P}_{Y})$ constructed by means of observations $\mathbb{X}_{n}:=\{X_{1},\ldots,X_{n}\}$ and $\mathbb{Y}_{m}:=\{Y_{1},\ldots,Y_{m}\}$ , $n,m\in\mathbb{N}$ . All random variables under consideration are defined on a complete probability space $(\Omega,\mathcal{F},{\sf P})$ .

For a finite set $E=\{z_{1},\ldots,z_{N}\}\subset\mathbb{R}^{d}$ , where $z_{i}\neq z_{j}$ $(i\neq j)$ , and a vector $v\in\mathbb{R}^{d}$ , renumerate points of $E$ as $z_{(1)}(v),\ldots,z_{(N)}(v)$ in such a way that $\|v-z_{(1)}\|\leq\ldots\leq\|v-z_{(N)}\|$ , here $\|\cdot\|$ is the Euclidean norm in $\mathbb{R}^{d}$ . If there are points $z_{i_{1}},\ldots,z_{i_{s}}$ having the same distance from $v$ then we numerate them according the increasing indexes among $i_{1},\ldots,i_{s}$ . In other words, for $k=1,\ldots,N$ , $z_{(k)}(v)$ is the $k$ -NN (Nearest Neighbor) for $v$ in a set $E$ . To indicate that $z_{(k)}(v)$ is constructed by means of $E$ we write $z_{(k)}(v,E)$ . Fix $k\in\{1,\ldots,n-1\}$ , $l\in\{1,\ldots,m\}$ and (for each $\omega\in\Omega$ ) put

[TABLE]

We assume that $X$ and $Y$ have densities $p=\frac{d{\sf P}_{X}}{d\mu}$ and $q=\frac{d{\sf P}_{Y}}{d\mu}$ . Then with probability one all points in $\mathbb{X}_{n}$ are distinct as well as points of $\mathbb{Y}_{m}$ .

Introduce an estimate of $D({\sf P}_{X}||{\sf P}_{Y})$ , for $n\geq k+1$ and $m\geq l$ , letting

[TABLE]

Here $\psi(t)=\frac{d}{dt}\log{\Gamma(t)}=\frac{\Gamma^{\prime}(t)}{\Gamma(t)}$ is the digamma function, $t>0$ .

Remark 1

If $k=l$ then

[TABLE]

and we come to formula (5) in [42].

Remark 2

All our results will be valid for the following generalization of statistics $\widehat{D}_{n,m}(k,l)$ :

[TABLE]

where $\mathcal{K}_{n}:=\{k_{i}\}_{i=1}^{n}$ , $\mathcal{L}_{n}:=\{l_{i}\}_{i=1}^{n}$ and, for some $r\in\mathbb{N}$ and all $i\in\mathbb{N}$ , $k_{i}\leq r$ , $l_{i}\leq r$ . Note that (1.4) is well-defined for $n\geq\max_{i=1,\ldots,n}k_{i}+1$ , $m\geq\max_{i=1,\ldots,n}l_{i}$ . We will only consider the estimates (1.3) since the study of $\widetilde{D}_{n,m}(\mathcal{K}_{n},\mathcal{L}_{n})$ follows the same lines.

Developing the approach of [7] to analysis of asymptotic behavior of the Kozachenko-Leonenko estimates of the Shannon differential entropy (introduced in [35], Part III, Section 20) we encounter new complications due to dealing with $k$ -nearest neighbor statistics for $k\in\mathbb{N}$ (not only for $k=1$ ). Accordingly, in the framework of the Kullback-Leibler divergence estimation, we propose a new way to bound the function $1-F_{m,l,x}(u)$ playing the key role in the proofs (see formula (3.10)). Also instead of the function $G(t)=t\log t$ (for $t>1$ ), used in [7] for study of the Shannon entropy estimates, we employ a regularly varying function $G_{N}(t)=t\log_{[N]}(t)$ where (for $t$ large enough) $\log_{[N]}(t)$ is the $N$ -fold iteration of the logarithmic function and $N\in\mathbb{N}$ is chosen arbitrarily. Whence in the definition of integral functional $K_{p,q}(\nu,N,t)$ by formula (2.4) below one can take a function $G_{N}(z)$ having, for $z>0$ , the growth rate close to that of function $z$ . Moreover, this permits a generalization of [7] results. Here we invoke convexity of $G_{N}$ (see Lemma 6) to provide more simple conditions for asymptotic unbiasedness and $L^{2}$ -consistency of the Shannon differential entropy than those employed in [7].

Mention in passing that there exist investigations treating other important aspects of the mutual information and entropy estimation. In [1] entropy estimators are applied to detection of the fiber materials inhomogeneities. The mixed models and conditional entropy estimation are studied, e.g., in [8], [10]. The central limit theorem for the Kozachenko-Leonenko estimates is established in [12]. The limit theorems for point processes on manifolds are employed in [30] to analyze behavior of the Shannon and the Rényi entropy estimates. The convergence rates for the Shannon entropy (truncated) estimates are obtained in [40] for one-dimensional case, see also [37] for multidimensional case. Ensemble estimation of density functional is considered in [38]. A recursive rectilinear partitioning for the differential entropy is considered in [39]. The mutual information estimation by the local Gaussian approximation is developed in [16]. Note that various deep results (including the central limit theorem) were obtained for the Kullback - Leibler estimates under certain conditions imposed on derivatives of unknown densities (see, e.g., the recent papers [2], [24], [33]). Our goal is to provide wide conditions for the asymptotic unbiasedness and $L^{2}$ -consistency of the Kullback - Leibler divergence estimates (1.3), as $n,m\to\infty$ , without such smoothness hypothesis. Also we do not assume that densities have bounded supports.

The paper is organized as follows. In Section 2 we formulate main results, Theorems 1 and 2. Their proofs are presented in Sections 3 and 4, respectively. Proofs of several lemmas are given in Appendix (Section 5).

2 Main results

Some notation is necessary. For a probability density $f$ in $\mathbb{R}^{d}$ , $x\in\mathbb{R}^{d}$ , $r>0$ and $R>0$ , as in [7], introduce the functions (or functionals depending on parameters)

[TABLE]

where $B(x,r):=\{y\in\mathbb{R}^{d}:\|x-y\|\leq r\}$ . Observe that changing $\sup_{r\in(0,R]}$ by $\sup_{r\in(0,\infty)}$ in the definition of $M_{f}(x,R)$ leads to the celebrated Hardy - Littlewood maximal function $M_{f}(x)$ widely used in harmonic analysis. Some properties of the function $\int_{B(x,r)}f(y)\,dy$ are considered, e.g., in [14]. According to Lemma 2.1 [7], for a probability density $f$ in $\mathbb{R}^{d}$ , the function $I_{f}(x,r)$ defined in (2.1) is continuous in $(x,r)\in\mathbb{R}^{d}\times(0,\infty)$ .

Set $e_{[0]}:=1$ and $e_{[N]}:=\exp\{e_{[N-1]}\}$ , $N\in\mathbb{N}$ . Introduce a function $\log_{[1]}(t):=\log t$ , $t>0$ . For $N\in\mathbb{N}$ , $N>1$ , set $\log_{[N]}(t):=\log(\log_{[N-1]}(t)).$ Evidently, this function (for $N>1$ ) is defined if $t>e_{[N-2]}$ . For $N\in\mathbb{N}$ , consider the continuous nondecreasing function $G_{N}:\mathbb{R}_{+}\to\mathbb{R}_{+}$ , given by formula

[TABLE]

For probability densities $p,q$ in $\mathbb{R}^{d}$ , some $N\in\mathbb{N}$ and positive constants $\nu,t,\varepsilon,R$ , we define the following functionals with values in $[0,\infty]$

[TABLE]

Set $K_{p,q}(\nu,N):=K_{p,q}(\nu,N,e_{[N]})$ . Clearly, for any $N\in\mathbb{N}$ , $\nu,t,u>0$ such that $t<u$ , one has

[TABLE]

Remark 3

We stipulate that $1/0:=\infty$ (consequently $m_{q}^{-\varepsilon_{2}}(x,R):=\infty$ when $m_{q}(x,R)=0$ ). For arbitrary versions of $p$ and $q$ , we can write in (2.5), (2.6) the integrals over the support $S(p)$ instead of integrating over $\mathbb{R}^{d}$ (obviously, the results do not depend on the choice of versions).

Theorem 1

Let ${\sf P}_{X}$ and ${\sf P}_{Y}$ have densities $p$ and $q$ , respectively. Suppose that $p$ and $q$ are such that, for some $\varepsilon_{i}>0,R_{i}>0$ and $N_{j}\in\mathbb{N}$ , where $i=1,2,3,4$ and $j=1,2$ , the functionals $K_{p,q}(1,N_{1})$ , $Q_{p,q}(\varepsilon_{1},R_{1})$ , $T_{p,q}(\varepsilon_{2},R_{2})$ , $K_{p,p}(1,N_{2})$ , $Q_{p,p}(\varepsilon_{3},R_{3})$ , $T_{p,p}(\varepsilon_{4},R_{4})$ are finite. Then, for any fixed $k,l\in\mathbb{N}$ , the estimates $\widehat{D}_{n,m}(k,l)$ , introduced in (1.3), are asymptotically unbiased, i.e.

[TABLE]

Remark 4

It is useful to note that if $Q_{p,q}(\varepsilon_{1},R_{1})<\infty$ and $T_{p,q}(\varepsilon_{2},R_{2})<\infty$ for some positive $\varepsilon_{1},\,\varepsilon_{2},R_{1},R_{2}$ then $\int_{\mathbb{R}^{d}}p(x)|\log{q(x)}|\,dx<\infty$ . Indeed, definition (2.2) and the Lebesgue differentiation theorem (see, e.g., Theorem 25.17 [43]) yield that $m_{q}(x,R_{2})\leq q(x)\leq M_{q}(x,R_{1})$ for $\mu$ -almost all $x\in\mathbb{R}^{d}$ . Evidently, $\log z\leq\frac{1}{\varepsilon}z^{\varepsilon}$ for any $z\geq 1$ and each $\varepsilon>0$ . Consequently,

[TABLE]

So, the integrals $Q_{p,q}(\varepsilon_{1},R_{1})$ , $T_{p,q}(\varepsilon_{2},R_{2})$ , $Q_{p,p}(\varepsilon_{3},R_{3})$ , $T_{p,p}(\varepsilon_{4},R_{4})$ finiteness implies the finiteness of integral in (1.2) (and also guarantees that ${\sf P}_{X}\ll{\sf P}_{Y}$ ).

Lemma 1

Let $p$ and $q$ be any probability densities in $\mathbb{R}^{d}$ . Then the following statements are valid.

$1)$ * If $K_{p,q}(\nu_{0},N_{0})<\infty$ for some $\nu_{0}>0$ and $N_{0}\in\mathbb{N}$ then $K_{p,q}(\nu,N)<\infty$ for any $\nu\in(0,\nu_{0}]$ and each $N\geq N_{0}$ .*

$2)$ * If $Q_{p,q}(\varepsilon_{1},R_{1})<\infty$ for some $\varepsilon_{1}>0$ and $R_{1}>0$ then $Q_{p,q}(\varepsilon,R)<\infty$ for any $\varepsilon\in(0,\varepsilon_{1}]$ and each $R>0$ .*

$3)$ * If $T_{p,q}(\varepsilon_{2},R_{2})<\infty$ for some $\varepsilon_{2}>0$ and $R_{2}>0$ then $T_{p,q}(\varepsilon,R)<\infty$ for any $\varepsilon\in(0,\varepsilon_{2}]$ and each $R>0$ .*

The proof is given in Appendix. In view of Lemma 1, one can recast Theorem 1 as follows.

Corollary 1

Let, for some positive $\varepsilon,R$ and $N\in\mathbb{N}$ , the functionals $K_{p,q}(1,N)$ , $Q_{p,q}(\varepsilon,R)$ , $T_{p,q}(\varepsilon,R)$ , $K_{p,p}(1,N)$ , $Q_{p,p}(\varepsilon,R)$ , $T_{p,p}(\varepsilon,R)$ be finite. Then (2.8) holds. Moreover, we obtain the equivalent conditions assuming that these functionals are finite for some $\varepsilon>0$ and $R=\varepsilon$ .

Let us also consider the following simple conditions.

$(A;p,q,\nu)$ For probability densities $p,q$ in $\mathbb{R}^{d}$ and some positive $\nu$

[TABLE]

We formally set $\log 0:=-\infty$ and, as usual, $\int_{A}g(z)Q(dz)=0$ whenever $g(z)=\infty$ (or $-\infty$ ) for $z\in A$ and $Q(A)=0$ , where $Q$ is a $\sigma$ -finite measure on $(\mathbb{R}^{d},\mathcal{B}(\mathbb{R}^{d}))$ .

$(B_{1};f)$ There exists a version of density $f$ such that, for some $M(f)\in(0,\infty)$ ,

[TABLE]

$(C_{1};f$ ) There exists a version of density $f$ such that, for some $m(f)\in(0,\infty)$ ,

[TABLE]

Corollary 2

Let conditions $(A;p,q,\nu)$ and $(A;p,p,\nu)$ be satisfied with some $\nu>1$ . Then (2.8) is true, provided that $(B_{1};f)$ and $(C_{1};f)$ are valid for $f=p$ and $f=q$ . Moreover, if the latter assumption concerning $(B_{1};f)$ and $(C_{1};f)$ holds then (2.8) is true whenever $p$ and $q$ have bounded supports.

Next we formulate conditions to guarantee $L^{2}$ -consistency of estimates (1.3).

Theorem 2

Let the requirements $K_{p,q}(1,N_{1})<\infty$ and $K_{p,p}(1,N_{2})<\infty$ in conditions of Theorem 1 be replaced by $K_{p,q}(2,N_{1})<\infty$ and $K_{p,p}(2,N_{2})<\infty$ . Then, for any fixed $k,l\in\mathbb{N}$ , the estimates $\widehat{D}_{n,m}(k,l)$ are $L^{2}$ -consistent, i.e.

[TABLE]

Due to Lemma 1 one can recast Theorem 2 as follows.

Corollary 3

Let, for some positive $\varepsilon,R$ and $N\in\mathbb{N}$ , the functionals $K_{p,q}(2,N)$ , $Q_{p,q}(\varepsilon,R)$ , $T_{p,q}(\varepsilon,R)$ , $K_{p,p}(2,N)$ , $Q_{p,p}(\varepsilon,R)$ , $T_{p,p}(\varepsilon,R)$ be finite. Then (2.10) holds. Moreover, we obtain the equivalent conditions assuming that these functionals are finite for some $\varepsilon>0$ and $R=\varepsilon$ .

Corollary 4

Let conditions $(A;p,q,\nu)$ and $(A;p,p,\nu)$ be satisfied with some $\nu>2$ . Assume that $(B_{1};f)$ and $(C_{1};f)$ are valid for $f=p$ and $f=q$ . Then (2.10) is true. Moreover, if the latter assumption concerning $(B_{1};f)$ and $(C_{1};f)$ holds then (2.10) is true whenever $p$ and $q$ have bounded supports.

Note that D.Evans considered the “positive density condition” in Definition 2.1 of [14] meaning that there exist constants $\beta>1$ and $\delta>0$ such that $\frac{r^{d}}{\beta}\leq\int_{B(x,r)}q(y)dy\leq\beta r^{d}$ for all $0\leq r\leq\delta$ and $x\in\mathbb{R}^{d}$ . Consequently $m_{q}(x,\delta)\geq\frac{1}{\beta V_{d}}:=m>0$ , $x\in\mathbb{R}^{d}$ . Then $T_{p,q}(\varepsilon,\delta)\leq m^{-\varepsilon}\int_{\mathbb{R}^{d}}p(x)\,dx=m^{-\varepsilon}<\infty$ for all $\varepsilon>0$ . Analogously, $M_{q}(x,\delta)\leq\frac{\beta}{V_{d}}:=M$ , $M>0$ , $x\in\mathbb{R}^{d}$ , and $Q_{p,q}(\varepsilon,\delta)\leq M^{\varepsilon}\int_{\mathbb{R}^{d}}p(x)\,dx=M^{\varepsilon}<\infty$ for all $\varepsilon>0$ . It was proved in [15] that if $f$ is smooth and its support is a compact convex body in $\mathbb{R}^{d}$ then the mentioned inequalities from Definition 2.1 of [14] hold. Therefore, if $p$ and $q$ are smooth and their supports are compact convex bodies in $\mathbb{R}^{d}$ then one can simplify conditions of Corollaries 1 and 3.

Now instead of (C1; $f$ ) we consider the following condition introduced in [7] that allows us to work with densities, whose supports need not be bounded.

$(C_{2};f)$ For a fixed $R>0$ , there exists a constant $c>0$ and a version of a density $f$ such that

[TABLE]

Remark 5

If, for some positive $\varepsilon$ , $R$ and $c$ , condition $(C_{2};q$ ) is true and

[TABLE]

then obviously $T_{p,q}(\varepsilon,R)<\infty$ . Thus in Theorems 1 and 2 one can employ, for $f=p$ and $f=q$ , condition $(C_{2};f)$ and suppose, for some $\varepsilon>0$ , finiteness of $\int_{\mathbb{R}^{d}}q(x)^{-\varepsilon}p(x)dx$ and $\int_{\mathbb{R}^{d}}p^{1-\varepsilon}(x)dx$ instead of the corresponding assumptions $T_{p,q}(\varepsilon,R)<\infty$ and $T_{p,p}(\varepsilon,R)<\infty$ . To illustrate this observation we provide a result for a density with unbounded support.

Corollary 5

Let $X$ , $Y$ be Gaussian random vectors in $\mathbb{R}^{d}$ with ${\sf E}X=\mu_{X}$ , ${\sf E}Y=\mu_{Y}$ and nondegenerate covariance matrices $\Sigma_{X}$ and $\Sigma_{Y}$ , respectively. Then relations (2.8) and (2.10) hold where

[TABLE]

The latter formula can be found, e.g., in [25], p. 147. The proof of Corollary 5 is discussed in Appendix.

Similarly to condition $(C_{2};f)$ let us consider the following one.

$(B_{2};f)$ For a fixed $R>0$ , there exists a constant $C>0$ and a version of a density $f$ such that

[TABLE]

Remark 6

If, for some positive $\varepsilon$ , $R$ and $c$ , condition $(B_{2};q)$ is true and

[TABLE]

then obviously $Q_{p,q}(\varepsilon,R)<\infty$ . Thus in Theorems 1 and 2 one can employ, for $f=p$ and $f=q$ , condition $(B_{2};f)$ and suppose that $\int_{\mathbb{R}^{d}}q(x)^{\varepsilon}p(x)dx$ and $\int_{\mathbb{R}^{d}}p^{1+\varepsilon}(x)dx$ are finite (for some $\varepsilon>0$ ) instead of the assumptions $Q_{p,q}(\varepsilon,R)<\infty$ and $Q_{p,p}(\varepsilon,R)<\infty$ .

For a fixed $k\in\{1,\ldots,n-1\}$ , consider the Kozachenko - Leonenko estimate of the Shannon differential entropy $H(X)$ of a vector $X$ with values in $\mathbb{R}^{d}$ having a density $p$ w.r.t. the Lebesgue measure. Namely, $H(X):=-\int_{\mathbb{R}^{d}}(\log p(x))p(x)\mu(dx)$ and, for i.i.d. observations $X_{1},X_{2},\ldots$ , such that $law(X_{1})=law(X)$ , set for all $n\geq k+1$ ,

[TABLE]

Similar to (1.4) one can employ the following generalization of statistics $\widehat{H}_{n}(k)$ :

[TABLE]

where $\mathcal{K}_{n}:=\{k_{i}\}_{i=1}^{n}$ , and, for some $r\in\mathbb{N}$ and all $i\in\mathbb{N}$ , $k_{i}\leq r$ .

Corollary 6

Let $Q_{p,p}(\varepsilon,R)<\infty$ and $T_{p,p}(\varepsilon,R)<\infty$ for some positive $\varepsilon$ and $R$ . Then the following statements hold for any fixed $k\in\mathbb{N}$ .

1) If, for some $N\in\mathbb{N}$ , $K_{p,p}(1,N)<\infty$ , then ${\sf E}\widehat{H}_{n}(k)\to H(X),\;\;n\to\infty.$

2) If, for some $N\in\mathbb{N}$ , $K_{p,p}(2,N)<\infty$ , then ${\sf E}(\widehat{H}_{n}(k)-H(X))^{2}\to 0,\;\;n\to\infty.$

In particular, one can employ $L_{p,p}(\nu)$ with $\nu>1$ instead of $K(1,N)$ , and with $\nu>2$ instead of $K(2,N)$ , where $N\in\mathbb{N}$ .

The proof of the first statement of this corollary is contained in the proof of Theorem 1, Step 5. In a similar way one can infer the second statement of Corollary 6 by means of the proof of Theorem 2, Step 5.

3 Proof of Theorem 1

For $n,m\in\mathbb{N}$ such that $n>1$ , for fixed $k\in\mathbb{N}$ and $m\in\mathbb{N}$ , where $1\leq k\leq n-1$ , $1\leq l\leq m$ and $i=1,\ldots,n$ , set $\phi_{m,l}(i)=mV^{d}_{m,l}(i)$ , $\zeta_{n,k}(i)=(n-1)R^{d}_{n,k}(i)$ . Then we can rewrite the estimate $\widehat{D}_{n,m}(k,l)$ as follows

[TABLE]

It is sufficient to prove the following two claims.

Statement 1. For each fixed $l$ , all $m$ large enough and any $i\in\mathbb{N}$ , ${\sf E}|\log\phi_{m,l}(i)|$ is finite. Moreover,

[TABLE]

Statement 2. For each fixed $k$ , all $n$ large enough and any $i\in\mathbb{N}$ , ${\sf E}|\log\zeta_{n,k}(i)|$ is finite. Moreover,

[TABLE]

Then in view of (3.1), (3.2) and (3.3)

[TABLE]

We are going to discuss in detail only the proof of Statement 1, since Statement 2 is established in a similar way. It was explained in [7] that if $V$ is a nonegative random variable (hence ${\sf E}V\leq\infty$ ) and $X$ is an arbitrary random vector with values in $\mathbb{R}^{d}$ then

[TABLE]

Formula (3.4) means that simultaneously both sides are finite or infinite and coincide. Let $F(u,\omega)$ be a regular conditional distribution function of $V$ given $X$ where $u\in[0,\infty)$ and $\omega\in\Omega$ . Let $h$ be a measurable function such that $h:\mathbb{R}\to[0,\infty)$ . Then, for ${\sf P}_{X}$ -almost all $x\in\mathbb{R}^{d}$ , it follows (without assumption ${\sf E}h(V)<\infty$ ) that

[TABLE]

This means that both sides of (3.5) are finite or infinite simultaneously and coincide.

By virtue of (3.4) and (3.5) one can prove that ${\sf E}|\log\phi_{m,l}(i)|<\infty$ , for all $m$ large enough, fixed $l$ and for all $i\in\mathbb{N}$ , and (3.2) holds. For this purpose we take $V=\phi_{m,l}(i)$ , $X=X_{i}$ and $h(u)=|\log u|$ , $u>0$ (we use $h(u)=\log^{2}u$ in the proof of Theorem 2). To reduce the volume of the paper we only consider below the evaluation of ${\sf E}\log\phi_{m,l}(i)$ as all steps of the proof are the same when treating ${\sf E}|\log\phi_{m,l}(i)|$ .

We divide the proof of Statement 1 into four steps. Preliminary Steps 1-3 are devoted to the demonstration, for $x\in A\subset S(p)$ and $i\in\mathbb{N}$ , of relation

[TABLE]

where $A$ depends on $p$ and $q$ versions, ${\sf P}_{X}(S(p)\setminus A)=0$ . Then Step 4 justifies the desired result (3.2). Step 5 contains the validation of Statement 2.

Step 1. Here we establish the distribution convergence for the auxiliary random variables. Fix any $i\in\mathbb{N}$ and $l\in\{1,\ldots,m\}$ . To simplify notation we do not indicate the dependence of functions on $d$ . For $x\in\mathbb{R}^{d}$ and $u>0$ , we study the asymptotic behavior (as $m\to\infty$ ) of the following function

[TABLE]

where

[TABLE]

We have employed in (3.10) the independence of random vectors $Y_{1},\ldots,Y_{m},X_{i}$ and condition that $Y_{1},\ldots,Y_{m}$ have the same law as $Y$ . We also took into account that an event $\left\{\left\lVert x-Y_{(l)}(x,\mathbb{Y}_{m})\right\rVert>r_{m}(u)\right\}$ is a union of pair-wise disjoint events $A_{s}$ , $s=0,\ldots,l-1$ . Here $A_{s}$ means that exactly $s$ observations among $\mathbb{Y}_{m}$ belong to the ball $B(x,r_{m}(u))$ and other $m-s$ are outside this ball (probability that $Y$ belongs to the sphere $\{z\in\mathbb{R}^{d}:\|z-x\|=r\}$ equals [math] since $Y$ has a density w.r.t. the Lebesgue measure $\mu$ ). Formulas (3.10) and (3.11) show that $F_{m,l,x}^{i}(u)$ is the regular conditional distribution function of $\phi_{m,l}(i)$ given $X_{i}=x$ . Moreover, (3.10) means that $\phi_{m,l}(i)$ , $i\in\{1,\ldots,n\}$ are identically distributed and we may omit the dependence on $i$ . So, one can replace $F_{m,l,x}^{i}(u)$ with $F_{m,l,x}(u)$ .

According to the Lebesgue differentiation theorem (see, e.g., [43], p. 654) if $q\in L^{1}(\mathbb{R}^{d})$ then, for $\mu$ -almost all $x\in\mathbb{R}^{d}$ , the following relation holds

[TABLE]

Let $\Lambda(q)$ stand for a set of all the Lebesgue points of a function $q$ , i.e. points $x\in\mathbb{R}^{d}$ satisfying (3.12). Clearly, $\Lambda(q)$ depends on the chosen version of $q$ belonging to the class of equivalent functions from $L^{1}(\mathbb{R}^{d})$ and, for an arbitrary version of $q$ , we have $\mu(\mathbb{R}^{d}\setminus\Lambda(q))=0$ .

Note that, for each $u>0$ , $r_{m}(u)\to 0$ as $m\to\infty$ , and $\mu(B(x,r_{m}(u)))=V_{d}{\big{(}r_{m}(u)\big{)}}^{d}=\frac{V_{d}u}{m}$ . Therefore by virtue of (3.12), for any fixed $x\in\Lambda(q)$ and $u>0$ ,

[TABLE]

where $\alpha_{m}{(x,u)}\to 0,\;m\to\infty$ . Hence, for $x\in\Lambda(q)\cap S(q)$ (thus $q(x)>0$ ), due to (3.10)

[TABLE]

Relation (3.14) means that

[TABLE]

where $\xi_{l,x}$ has $\Gamma(V_{d}\,q(x),l)$ distribution.

We assume without loss of generality (w.l.g.) that, for all $x\in S(q)$ , the random variables $\xi_{l,x}$ and $\{\xi_{m,l,x}\}_{m\geq l}$ are defined on a probability space $(\Omega,\mathcal{F},{\sf P})$ since in view of the Lomnicki - Ulam theorem (see, e.g. [18], p. 93) one can consider the independent copies of $Y_{1},Y_{2},\ldots$ and $\{\xi_{l,x}\}_{x\in S(q)}$ defined on a certain probability space. The convergence in law of random variables is preserved under continuous mapping. Hence, for any $x\in\Lambda(q)\cap S(q)$ , we come to the relation

[TABLE]

We took into account that, for each $x\in\Lambda(q)\cap S(q)$ , one has $\xi_{l,x}>0$ a.s. and since $Y$ has a density we infer that ${\sf P}(\xi_{m,l,x}>0)={\sf P}(\left\lVert x-Y_{(l)}(x,\mathbb{Y}_{m})\right\rVert>0)=1$ . More precisely, we can ignore zero values of nonnegative random variables (having zero values with probability zero) when we take their logarithms.

Step 2. Now we show that instead of (3.6) validity one can verify the following statement. For $\mu$ -almost every $x\in\Lambda(q)\cap S(q)$ ,

[TABLE]

Note that if $\eta\sim\Gamma(\alpha,\lambda)$ , where $\alpha>0$ and $\lambda>0$ , then

[TABLE]

Set $\alpha=V_{d}q(x)$ , where $q(x)>0$ for $x\in S(q)$ , and $\lambda=l$ . Then ${\sf E}\log\xi_{l,x}=\psi{(l)}-\log{(V_{d}q(x))}=\psi{(l)}-\log{V_{d}}-\log{q(x)}$ . By virtue of (3.5), for each $x\in\mathbb{R}^{d}$ ,

[TABLE]

Thus, for $x\in\Lambda(q)\cap S(q)$ , the relation ${\sf E}(\log\phi_{m,l}(1))|X_{1}=x)\to\psi{(l)}-\log{V_{d}}-\log{q(x)}$ holds if and only if (3.17) is true.

According to Theorem 3.5 [4] we would have established (3.17) if relation (3.16) could be supplemented, for $\mu$ -almost all $x\in\Lambda(q)\cap S(q)$ , by the uniform integrability of a family $\{\log\xi_{m,l,x}\}_{m\geq m_{0}(x)}$ . Note that, for each $N\in\mathbb{N}$ , a function $G_{N}(t)$ introduced by (2.3) is increasing on $(0,\infty)$ and $\frac{G_{N}(t)}{t}\to\infty$ , as $t\to\infty$ . Therefore, by the de la Valle Poussin theorem (see, e.g., Theorem 1.3.4 [6]), to guarantee, for $\mu$ -almost every $x\in\Lambda(q)\cap S(q)$ , the uniform integrability of $\{\log\xi_{m,l,x}\}_{m\geq m_{0}(x)}$ it suffices to prove, for such $x$ , a positive $C_{0}(x)$ and $m_{0}(x)\in\mathbb{N}$ , that

[TABLE]

where $G_{N_{1}}$ appears in conditions of Theorem 1.

Step 3 is devoted to proving validity of (3.21). It is convenient to divide this proof into its own parts (3a), (3b), etc. For any $N\in\mathbb{N}$ , set

[TABLE]

where the product over empty set (when $N=1$ ) is equal to 1.

We will employ the following result, its proof is given in Appendix.

Lemma 2

Let $F(u),u\in\mathbb{R}$ , be a distribution function such that $F(0)=0$ . Then, for each $N\in\mathbb{N}$ , one has

1) $\int_{\left(0,\frac{1}{e_{[N]}}\right]}G_{N}(|\log u|)dF(u)=\int_{\left(0,\frac{1}{e_{[N]}}\right]}F(u)(-g_{N}(u))du$ ,

2) $\int_{\left(e_{[N]},\infty\right)}G_{N}(|\log u|)dF(u)=\int_{\left(e_{[N]},\infty\right)}(1-F(u))g_{N}(u)du$ .

Note that, for $u\in\left(\frac{1}{e_{[N_{1}]}},e_{[N_{1}]}\right]$ , we have $G_{N_{1}}(|\log u|)=0$ . Therefore, due to Lemma 2, for $x\in\Lambda(q)\cap S(q)$ and $m\geq l$ , we get ${\sf E}G_{N_{1}}(|\log{\xi_{m,l,x}}|):=I_{1}(m,x)+I_{2}(m,x)$ where

[TABLE]

For convenience sake we write $I_{1}(m,x)$ and $I_{2}(m,x)$ without indicating their dependence on $N_{1},l$ and $d$ . Recall that $N_{1}$ is fixed.

Part (3a). We provide bounds for $I_{1}(m,x)$ . Take $R_{1}>0$ appearing in conditions of Theorem 1 and any $u\in\left(0,\frac{1}{e_{[N_{1}]}}\right]$ . Let us denote $m_{1}:=\max\left\{\left\lceil\frac{1}{e_{[N_{1}]}R_{1}^{d}}\right\rceil,l\right\}$ , where $\lceil a\rceil:=\inf\{m\in\mathbb{Z}:m\geq a\}$ , $a\in\mathbb{R}$ . Then $r_{m}(u)=\left(\frac{u}{m}\right)^{1/d}\leq{\left(\frac{1}{e_{[N_{1}]}m}\right)}^{1/d}\leq R_{1}$ if $m\geq m_{1}$ . Note also that we can consider only $m\geq l$ everywhere below, because the size of sample $\mathbb{Y}_{m}$ should not be less than number of the neighbors $l$ (see, e.g., (3.10)). Thus, for $R_{1}>0$ , $u\in\left(0,\frac{1}{e_{[N_{1}]}}\right]$ , $x\in\mathbb{R}^{d}$ and $m\geq m_{1}$ ,

[TABLE]

and we obtain an inequality

[TABLE]

If $\varepsilon\in(0,1]$ and $t\in[0,1]$ then, for all $m\geq 1$ , invoking the Bernoulli inequality, one has

[TABLE]

By assumptions of the Theorem $Q_{p,q}(\varepsilon_{1},R_{1})<\infty$ for some $\varepsilon_{1}>0$ , $R_{1}>0$ . According to Lemma 1 we can assume that $\varepsilon_{1}<1$ . Thus, due to (3.23) and since $W_{m,x}(u)\in[0,1]$ for all $x\in\mathbb{R}^{d}$ , $u>0$ and $m\geq l$ , we get

[TABLE]

In view of (3.10), (3.22) and (3.24) one can claim now that, for all $x\in\Lambda(q)\cap S(q)$ , $u\in(0,\frac{1}{e_{[N]}}]$ and $m\geq m_{1}$ ,

[TABLE]

Therefore, for any $x\in\Lambda(q)\cap S(q)$ and $m\geq m_{1}$ , one can write

[TABLE]

where $U_{1}(\varepsilon,N,d):=V_{d}^{\varepsilon}L_{N}(\varepsilon)$ , $L_{N}(\varepsilon):=\int_{[e_{[N-1]},\infty)}(\log_{[N]}(t)+1)e^{-\varepsilon t}dt<\infty$ for each $\varepsilon>0$ and any $N\in\mathbb{N}$ . We took into account that $(-g_{N_{1}}(u))\leq\frac{1}{u}(\log_{[N_{1}]}(-\log u)+1)$ if $u\in\left(0,\frac{1}{e_{[N_{1}]}}\right]$ .

Part (3b). We give bounds for $I_{2}(m,x)$ . Since $g_{N_{1}}(u)\leq\frac{\log_{[N_{1}+1]}(u)+1}{u}$ if $u\in(e_{[N_{1}]},\infty)$ , we can write, for $m\geq\max\{e^{2}_{[N_{1}]},l\}$ ,

[TABLE]

Evidently,

[TABLE]

where $P_{m,x}(u)=1-W_{m,x}(u)$ and $Z\sim{\sf Bin}(m,P_{m,x}(u))$ .

By Markov’s inequality ${\sf P}(Z\geq x)\leq e^{-\lambda x}{\sf E}e^{\lambda Z}$ for any $\lambda>0$ and $x>0$ . One has

[TABLE]

Consequently, for each $\lambda>0$ ,

[TABLE]

To simplify bounds we take $\lambda=1$ and set $S_{1}=S_{1}(l):=e^{l-1}$ , $S_{2}:=1-\frac{1}{e}$ (recall that $l$ is fixed). Thus $S_{1}\geq 1$ and $S_{2}<1$ . Therefore,

[TABLE]

where we have used an elementary inequality $1-t\leq e^{-t}$ , $t\in[0,1]$ .

For $R_{2}>0$ appearing in conditions of the Theorem and any $u\in\left(e_{[N]},\sqrt{m}\right]$ , one can choose $m_{2}:=\max\left\{\left\lceil\frac{1}{R_{2}^{2d}}\right\rceil,\left\lceil e_{[N_{1}]}^{2}\right\rceil,l\right\}$ such that if $m\geq m_{2}$ then $r_{m}(u)=\left(\frac{u}{m}\right)^{1/d}\leq\left(\frac{1}{\sqrt{m}}\right)^{1/d}\leq R_{2}.$ Due to (3.11) and (3.39), for $u\in(e_{[N_{1}]},\sqrt{m}]$ and $m\geq m_{2}$ , one has

[TABLE]

by definition of $m_{f}$ (for $f=q$ ) in (2.2). Now we use the following Lemma 3.2 of [7].

Lemma 3

For a version of a density $q$ and each $R>0$ , one has $\mu(S(q)\setminus D_{q}(R))=0$ where $D_{q}(R):=\{x\in S(q):m_{q}(x,R)>0\}$ and $m_{q}(\cdot,R)$ is defined according to (2.2).

It is easily seen that, for any $t>0$ and each $\delta\in(0,e]$ , one has $e^{-t}\leq t^{-\delta}$ . Thus, for $x\in D_{q}(R_{2})$ , $m\geq m_{2}$ , $u\in(e_{[N]},\sqrt{m}]$ and $\varepsilon_{2}>0$ , we deduce from conditions of the Theorem (in view of Lemma 1 one can suppose that $\varepsilon_{2}\in(0,e]$ ), taking into account that $m_{q}(x,R_{2})>0$ for $x\in D_{q}(R_{2})$ and applying relation (3.42), that

[TABLE]

Thus, for all $x\in\Lambda(q)\cap S(q)\cap D_{q}(R_{2})$ and any $m\geq m_{2}$ ,

[TABLE]

where $U_{2}(\varepsilon,N,d,l):=S_{1}(l)\,L_{N}(\varepsilon)(S_{2}\,V_{d})^{-\varepsilon}$ .

Part (3c). Consider $J_{2}(m,x)$ . In view of (3.43), for all $x\in\Lambda(q)\cap S(q)\cap D_{q}(R_{2})$ and any $m\geq m_{2}$ , it holds $1-F_{m,l,x}(\sqrt{m})\leq S_{1}\left(S_{2}V_{d}\,m_{q}(x,R_{2})\sqrt{m}\right)^{-\varepsilon_{2}}$ . Thus (as $m_{2}\geq 2$ )

[TABLE]

Then, for all $x\in\Lambda(q)\cap S(q)\cap D_{q}(R_{2})$ and any $m\geq m_{2}$ ,

[TABLE]

where $U_{3}(m,\varepsilon_{2},N_{1},d,l):=\frac{3}{2}S_{1}(l)(S_{2}V_{d})^{-\varepsilon_{2}}m^{-\frac{\varepsilon_{2}}{2}}\log{m}\left(\log_{[N_{1}]}(2\log{m})+1\right)\to 0$ , $m\to\infty$ .

Part (3d). To get bounds for $J_{3}(m,x)$ we employ several auxiliary results.

Lemma 4

For each $N\in\mathbb{N}$ and any $\nu>0$ , there are $a:=a(d,\nu)\geq 0,\,b:=b(N,d,\nu)\geq 0$ such that, for arbitrary $x,y\in\mathbb{R}^{d}$ ,

[TABLE]

The proof is provided in Appendix.

On the one hand, by (3.11), for any $w\geq 0$ , we get

[TABLE]

On the other hand, by (3.10), one has $F_{1,1,x}(w)=1-\big{(}1-W_{1,x}(w)\big{)}=W_{1,x}(w)$ . Consequently, for any $m\in\mathbb{N}$ , $w\geq 0$ and all $x\in\mathbb{R}^{d}$ ,

[TABLE]

Moreover, $F_{1,1,x}(w)={\sf P}(\left\lVert Y-x\right\rVert^{d}\leq w)$ . So, $\xi_{1,1,x}\stackrel{{\scriptstyle law}}{{=}}\left\lVert Y-x\right\rVert^{d}$ . Thus, in view of Lemmas 2 and 4 (for $N=N_{1}$ and $\nu=1$ )

[TABLE]

since $G_{N}(t)=0$ for $t\in[0,e_{[N-1]}]$ , $N\in\mathbb{N}$ .

Now we will estimate $1-F_{m,l,x}(u)$ in a way different from (3.38). Fix any $\delta>0$ . Note that, for all $m\geq(l-1)\left(1+\frac{1}{\delta}\right)$ and $s\in\{0,\ldots,l-1\}$ , it holds $\frac{m}{m-s}\leq\frac{m}{m-l+1}\leq 1+\delta$ . Then, for all $x\in\mathbb{R}^{d}$ , $u\geq 0$ and $m\geq\max\{l,(l-1)\left(1+\frac{1}{\delta}\right)\}$ , in view of (3.10) one can write

[TABLE]

We are going to employ the following statement as well.

Lemma 5

For each $N\in\mathbb{N}$ , a function $\log_{[N]}(t)$ , $t>e_{[N-1]}$ , is slowly varying at infinity.

Its proof is elementary and thus is omitted.

Part (3e). Now we are ready to get the bound for $J_{3}(m,x)$ . Set $u=mw$ . Then one has

[TABLE]

Inequality $w>m$ and Lemma 5 imply $\log_{[N_{1}+1]}(mw)\leq\log_{[N_{1}+1]}(w^{2})=\log_{[N_{1}]}(2\log{w})\leq 2\log_{[N_{1}+1]}(w)$ for $w$ large enough, namely for all $w\geq W$ , where $W=W(N_{1})$ .

Take $\delta>0$ and set $m_{3}:=\max\left\{l,\left\lceil(l-1)\left(1+\frac{1}{\delta}\right)\right\rceil,\left\lceil W(N_{1})\right\rceil,\left\lceil e_{[N_{1}]}\right\rceil\right\}$ . Let further $m\geq m_{3}$ . Then

[TABLE]

By virtue of (3.49) and (3.56) one has

[TABLE]

Hence it can be seen that

[TABLE]

Introduce

[TABLE]

Let us note: 1) ${\sf P}_{X}(S(p)\setminus A_{p}(G_{N_{1}}))=0$ as we assumed that $K_{p,q}(1,N_{1})<\infty$ ; 2) ${\sf P}_{X}(S(p)\setminus S(q))=0$ as ${\sf P}_{X}\ll{\sf P}_{Y}$ ; 3) $\mu\big{(}S(q)\setminus(\Lambda(q)\cap D_{q}(R_{2}))\big{)}=0$ due to Lemma 3. Since ${\sf P}_{X}\ll\mu$ we conclude that ${\sf P}_{X}\big{(}S(q)\setminus(\Lambda(q)\cap D_{q}(R_{2}))\big{)}=0$ . Hence, one has ${\sf P}_{X}\big{(}S(p)\setminus(\Lambda(q)\cap D_{q}(R_{2}))\big{)}=0$ in view of 2) and because $B\setminus C\subset(B\setminus A)\cup(A\setminus C)$ for any $A,B,C\subset\mathbb{R}^{d}$ . Set further $A:=\Lambda(q)\cap S(q)\cap D_{q}(R_{2})\cap S(p)\cap A_{p}(G_{N_{1}})$ . It follows from 1), 2) and 3) that ${\sf P}_{X}(S(p)\setminus A)=0$ , so ${\sf P}_{X}(A)=1$ . We are going to consider only $x\in A$ .

Then, by virtue of (3.55) and (3.59), for all $m\geq m_{3}$ and $x\in A$ , we come to the inequality

[TABLE]

where $A(\delta,d):=2(1+\delta)a(d,1)$ , $B(\delta,d,N_{1}):=2(1+\delta)b(N_{1},d,1)$ .

Part (3f). Thus, for each $x\in A$ and $m\geq\max\{m_{1},m_{2},m_{3}\}$ , taking into account (3.30), (3.46), (3.47) and (3.60) we can claim that

[TABLE]

Moreover, for any $\kappa>0$ , one can take $m_{4}=m_{4}(\kappa,\varepsilon_{2},N_{1},d,l)\in\mathbb{N}$ such that $U_{3}(m,\varepsilon_{2},N_{1},d,l)\leq\kappa$ for $m\geq m_{4}$ . Then by virtue of (3.64), for each $x\in A$ and $m\geq m_{0}:=\max\{m_{1},m_{2},m_{3},m_{4}\}$ ,

[TABLE]

Hence, for each $x\in A$ , the uniform integrability of the family $\left\{\log{\xi_{m,l,x}}\right\}_{m\geq m_{0}}$ is established.

Step 4. Now we verify (2.8). We have already proved, for each $x\in A$ (thus, for ${\sf P}_{X}$ -almost every $x$ belonging to $S(p)$ ) that ${\sf E}(\log\phi_{m,l}(1)|X_{1}=x)\to\psi(l)-\log{V_{d}}-\log q(x)$ , $m\to\infty$ . Set $Z_{m,l}(x):={\sf E}(\log\phi_{m,l}(1)|X_{1}=x)={\sf E}\log{\xi_{m,l,x}}$ . Consider $x\in A$ and take any $m\geq\max\{m_{1},m_{2},m_{3},m_{4}\}$ . We use the following property of $G_{N}$ which is shown in Appendix.

Lemma 6

For each $N\in\mathbb{N}$ , a function $G_{N}$ is convex on $\mathbb{R}_{+}$ .

Thus a function $G_{N_{1}}$ is nondecreasing and convex. On account of the Jensen inequality

[TABLE]

Relation (3.67) guarantees that, for all $m\geq m_{0}$ ,

[TABLE]

We have established uniform integrability of the family $\{Z_{m,l}\}_{m\geq m_{0}}$ w.r.t. measure ${\sf P}_{X}$ . Thus, for $i\in\mathbb{N}$ ,

[TABLE]

and we come to relation (3.2).

Step 5. Let us briefly discuss the Statement 2. Similar to $F_{m,l,x}(u)$ , one can introduce, for $n,k\in\mathbb{N}$ , $n\geq k+1$ , $x\in\mathbb{R}^{d}$ and $u>0$ , the following function

[TABLE]

where $r_{n}(u)$ was defined in (3.11),

[TABLE]

Formulas (3.71) and (3.72) show that $\widetilde{F}_{n,k,x}(u)$ is the regular conditional distribution function of $\zeta_{n,k}(i)$ given $X_{i}=x$ . Moreover, for any fixed $u>0$ and $x\in\Lambda(p)\cap S(p)$ (thus $p(x)>0$ ),

[TABLE]

Hence, $\widetilde{\xi}_{n,k,x}\stackrel{{\scriptstyle law}}{{\rightarrow}}\widetilde{\xi}_{k,x}$ , $x\in\Lambda(p)\cap S(p)$ , $n\to\infty$ . For $N\in\mathbb{N}$ , set $\widetilde{A}_{p}(G_{N}):=\{x\in S(p):\widetilde{R}_{N}(x)<\infty\}$ , where

[TABLE]

Introduce $\widetilde{A}:=\Lambda(p)\cap S(p)\cap D_{p}(R_{4})\cap\widetilde{A}_{p}(G_{N_{2}})$ . Then ${\sf P}(\widetilde{A})=1$ and, for $x\in\widetilde{A}$ , one can verify that ${\sf E}G_{N_{2}}(|\log{\widetilde{\xi}_{n,k,x}}|)\leq\widetilde{C}_{0}(x)<\infty$ and therefore ${\sf E}\log\widetilde{\xi}_{n,k,x}\to{\sf E}\log\widetilde{\xi}_{k,x}$ . Thus ${\sf E}(\log\zeta_{n,k}(1)|X_{1}=x)\to\psi(k)-\log{V_{d}}-\log p(x)$ , $n\to\infty$ . Set $\widetilde{Z}_{n,k}(x):={\sf E}(\log\zeta_{n,k}(1)|X_{1}=x)$ . One can see that, for all $n\geq n_{0}$ , $\int_{\mathbb{R}^{d}}G_{N_{2}}(|\widetilde{Z}_{n,k}(x)|)p(x)\,dx<\infty$ . Hence similar to Steps 1–4 we come to relation (3.3).

The proof of Theorem 1 is complete. $\square$

4 Proof of Theorem 2

First of all note that, in view of Lemma 1, the finiteness of $K_{p,q}(2,N_{1})$ and $K_{p,p}(2,N_{2})$ implies the finiteness of $K_{p,q}(1,N_{1})$ and $K_{p,p}(1,N_{2})$ , respectively. Thus the conditions of Theorem 2 entail validity of Theorem 1 statements. Consequently under the conditions of Theorem 2, for $n$ and $m$ large enough, one can claim that $\widehat{D}_{n,m}(k,l)\in L^{1}(\Omega)$ and ${\sf E}\widehat{D}_{n,m}(k,l)\to D({\sf P}_{X}||{\sf P}_{Y})$ , as $n,m\to\infty$ .

We will show that $\widehat{D}_{n,m}(k,l)\in L^{2}(\Omega)$ for all $n$ and $m$ large enough. Then we can write

[TABLE]

Therefore to prove (2.10) we will demonstrate that ${\sf var}\left(\widehat{D}_{n,m}(k,l)\right)\to 0$ , $n,m\to\infty$ .

Due to (3.10) the random variables $\log{\phi_{m,l}(1)},\ldots,\log{\phi_{m,l}(n)}$ are identically distributed (and $\log{\zeta_{n,k}(1)}$ , $\ldots,\log{\zeta_{n,k}}(n)$ are identically distributed as well). Hence (3.1) yields

[TABLE]

We do not strictly adhere to notation used in Theorem 1 proof. Namely, the choice of the sets $A\subset\mathbb{R}^{d}$ , $\widetilde{A}\subset\mathbb{R}^{d}$ , positive $U_{j},C_{j}(x),\widetilde{C}_{j}(x)$ and integers $m_{j},n_{j}$ , where $j\in\mathbb{Z}_{+}$ and $x\in\mathbb{R}^{d}$ , could be different. The proof of Theorem 2 is also divided into several steps. Steps 1-3 are devoted to the demonstration of relation $\frac{1}{n}{\sf var}{(\log\phi_{m,l}(1))}\to 0$ as $n,m\to\infty$ , while Step 4 contains the proof of relation $\frac{2}{n^{2}}\sum_{1\leq i<j\leq n}{\sf cov}(\log\phi_{m,l}(i),\log\phi_{m,l}(j))\to 0$ as $n,m\to\infty$ . In Step 5 we establish that

[TABLE]

This step is rather involved. In Step 6 we come to the desired statement ${\sf var}\left(\widehat{D}_{n,m}(k,l)\right)\to 0$ , $n,m\to\infty$ .

Step 1. We study ${\sf E}\log^{2}\left(\phi_{m,l}(1)\right)$ , as $m\to\infty$ . Consider

[TABLE]

where the first four sets appeared in Theorem 1 proof, and $A_{p,2}(G_{N})$ , for $N\in\mathbb{N}$ and a probability density $p$ on $\mathbb{R}^{d}$ , is defined quite similar to $A_{p}(G_{N})$ . Namely, for $x\in\mathbb{R}^{d}$ and $N\in\mathbb{N}$ , introduce

[TABLE]

and set $A_{p,2}(G_{N}):=\{x\in S(p):R_{N,2}(x)<\infty\}$ . Then ${\sf P}_{X}(S(p)\setminus A_{p,2}(G_{N_{1}}))=0$ since $K_{p,q}(2,N_{1})<\infty$ . It is easily seen that ${\sf P}_{X}(A)=1$ . The reasoning is the same as in the proof of Theorem 1.

Recall that, for each $x\in A$ , one has $\log\xi_{m,l,x}\stackrel{{\scriptstyle law}}{{\rightarrow}}\log\xi_{l,x},\,m\to\infty$ , where $\xi_{m,l,x}:=m\left\lVert x-Y_{(l)}(x,\mathbb{Y}_{m})\right\rVert^{d}$ and $\xi_{l,x}$ has $\Gamma(V_{d}\,q(x),l)$ distribution. Convergence in law of random variables is preserved under continuous mapping. Hence, for any $x\in A$ , we come to the relation

[TABLE]

In view of (3.10), for each $x\in A$ ,

[TABLE]

Note that if $\eta\sim\Gamma(\alpha,\lambda)$ , where $\alpha>0$ and $\lambda>0$ , then

[TABLE]

Since $\xi_{l,x}\sim\Gamma(V_{d}q(x),l)$ for $x\in S(q)$ , one has

[TABLE]

where $h_{1}:=h_{1}(l,d)$ and $h_{2}:=h_{2}(l,d)$ depends only on fixed $l$ and $d$ .

We prove now that, for $x\in A$ , one has

[TABLE]

By virtue of (4.11) and (4.15) relation (4.16) is equivalent to the following one ${\sf E}\log^{2}{\xi_{m,l,x}}\to{\sf E}\log^{2}{\xi_{l,x}}$ , $m\to\infty$ . So, in view of (4.8) to prove (4.16) it is sufficient to show that, for each $x\in A$ , a family $\left\{\log^{2}\xi_{m,l,x}\right\}_{m\geq m_{0}(x)}$ is uniformly integrable for some $m_{0}(x)\in\mathbb{N}$ . As in the proof of Theorem 1, we can verify that, for all $x\in A$ and some nonnegative $C_{0}(x)$ ,

[TABLE]

Step 2. Now our goal is to prove (4.17). For each $N\in\mathbb{N}$ , introduce $\rho(N):=\exp\{\sqrt{e_{[N-1]}}\}$ and

[TABLE]

As usual, a product over an empty set (if $N=1$ ) is equal to $1$ .

To show (4.17) we employ the following result.

Lemma 7

Let $F(u),u\in\mathbb{R}$ , be a distribution function such that $F(0)=0$ . Fix an arbitrary $N\in\mathbb{N}$ . Then

1) $\int_{\left(0,\frac{1}{\rho(N)}\right]}G_{N}(\log^{2}u)dF(u)=\int_{\left(0,\frac{1}{\rho(N)}\right]}F(u)(-h_{N}(u))du$ ,

2) $\int_{\left(\rho(N),\infty\right)}G_{N}(\log^{2}u)dF(u)=\int_{\left(\rho(N),\infty\right)}(1-F(u))h_{N}(u)du$ .

The proof of this lemma is omitted, being quite similar to one of Lemma 2. By Lemma 7 and since $G_{N_{1}}(\log^{2}u)=0$ , for $u\in\left(\frac{1}{\rho(N_{1})},\rho(N_{1})\right]$ , one has

[TABLE]

To simplify notation we do not indicate the dependence of $I_{i}(m,x)$ ( $i=1,2$ ) on $N_{1}$ , $l$ and $d$ .

We divide further proof into several parts.

Part (2a). At first we consider $I_{1}(m,x)$ . As in Theorem 1 proof, for fixed $R_{1}>0$ and $\varepsilon_{1}>0$ appearing in the conditions of Theorem 2, an inequality $F_{m,l,x}(u)\leq(M_{q}(x,R_{1}))^{\varepsilon_{1}}V_{d}^{\varepsilon_{1}}u^{\varepsilon_{1}}$ holds, for any $x\in A$ , $u\in\left(0,\frac{1}{\rho(N_{1})}\right]$ and $m\geq m_{1}:=\max\left\{\left\lceil\frac{1}{\rho{(N_{1})}R_{1}^{d}}\right\rceil,l\right\}$ . Taking into account that $0\leq(-h_{N_{1}}(u))\leq\frac{(-2\log u)\left(\log_{[N_{1}]}(\log^{2}u)+1\right)}{u}$ if $u\in\left(0,\frac{1}{\rho(N_{1})}\right]$ , we get, for $m\geq m_{1}$ ,

[TABLE]

Here $U_{1}(\varepsilon,N,d):=V_{d}^{\varepsilon}L_{N,2}(\varepsilon)$ , $L_{N,2}(\varepsilon):=\int_{\left[\sqrt{e_{[N-1]}},\infty\right)}2t\left(\log_{[N]}(t^{2})+1\right)e^{-\varepsilon t}\,dt<\infty$ for each $\varepsilon>0$ and any $N\in\mathbb{N}$ .

Part (2b). Consider $I_{2}(m,x)$ . As in the proof of Theorem 1, taking into account that, for $u\in(\rho(N_{1}),\infty)$ , $h_{N_{1}}(u)\leq\frac{2\log{u}}{u}\left(\log_{[N_{1}]}(\log^{2}u)+1\right)$ , we write, for all $m\geq\max\{\rho^{2}(N_{1}),l\}$ ,

[TABLE]

where we do not indicate the dependence of $J_{j}(m,x)$ ( $j=1,2,3$ ) on $N_{1}$ and $l$ .

For $R_{2}>0$ and $\varepsilon_{2}>0$ appearing in the conditions of Theorem 2, one can prove (see Theorem 1 proof), that inequality

[TABLE]

holds for any $x\in A$ , $u\in\left(\rho(N_{1}),\sqrt{m}\right]$ and all $m\geq m_{2}:=\max\left\{\left\lceil\frac{1}{R_{2}^{2d}}\right\rceil,\left\lceil\rho^{2}(N_{1})\right\rceil,l\right\}$ . Here $S_{1}:=S_{1}(l)$ and $S_{2}$ are the same as in the proof of Theorem 1. For all $x\in A$ and $m\geq m_{2}$ , we come to the relations

[TABLE]

where $U_{2}(\varepsilon,N,d,l):=2S_{1}(l)\,L_{N,2}(\varepsilon)(S_{2}\,V_{d})^{-\varepsilon_{2}}$ .

Part (2c). Now we turn to $J_{2}(m,x)$ . Take $\delta>0$ . Then, due to (4.21), for all $x\in A$ and any $m\geq m_{2}$ ,

[TABLE]

where $U_{3}(m,\varepsilon,N,d,l):=4S_{1}(S_{2}V_{d})^{-\varepsilon_{2}}m^{-\frac{\varepsilon_{2}}{2}}\left(\log^{2}m\right)\left(\log_{[N_{1}]}(4\log^{2}m)+1\right)\to 0$ , $m\to\infty$ . Part (2d). Now we consider $J_{3}(m,x)$ . Take $u=mw$ . Then $J_{3}(m,x)$ has the form

[TABLE]

Due to Lemma 5 there exists $T(N)>\rho(N)$ such that

[TABLE]

Pick some $\delta>0$ and set $m_{3}:=\max\left\{l,\left\lceil(l-1)\left(1+\frac{1}{\delta}\right)\right\rceil,\left\lceil T(N_{1})\right\rceil,\left\lceil\rho(N_{1})\right\rceil\right\}$ , where $T(N)$ was introduced in (4.29). Consider $m\geq m_{3}$ . In view of Lemma 4 (for $N=N_{1}$ and $\nu=2$ ), (3.57), (4.29), (2.7) and since $w>m$ ,

[TABLE]

$R_{N,2}(x)$ is defined in (4.7), $A(\delta,d):=4(1+\delta)a(d,2)$ , $B(\delta,d,N_{1}):=4(1+\delta)\big{(}a(d,2)G_{N_{1}}(e^{2}_{[N_{1}-1]})+b(N_{1},d,2)\big{)}$ .

Part (2e). Thus, for each $x\in A$ and $m\geq\max\{m_{1},m_{2},m_{3}\}$ , taking into account (4.20), (4.24), (4.28) and (4.36), we can claim that

[TABLE]

Moreover, for any $\kappa>0$ , one can choose $m_{4}:=m_{4}(\kappa,\varepsilon_{2},N_{1},d,l)\in\mathbb{N}$ such that, for $m\geq m_{4}$ , it holds $U_{3}(m,\varepsilon_{2},N_{1},d,l)\leq\kappa$ . Then by (4.40), for each $x\in A$ and $m\geq m_{0}:=\max\{m_{1},m_{2},m_{3},m_{4}\}$ ,

[TABLE]

Hence we have proved the uniform integrability of the family $\left\{\log^{2}{\xi_{m,l,x}}\right\}_{m\geq m_{0}}$ for each $x\in A$ . Therefore, for any $x\in A$ (thus for ${\sf P}_{X}$ -almost every $x\in S(p)$ ), relation (4.16) holds.

Step 3. Now we can return to ${\sf E}\log^{2}\phi_{m,l}(1)$ . Set $\Delta_{m,l}(x):={\sf E}(\log^{2}\phi_{m,l}(1)|X_{1}=x)={\sf E}\log^{2}{\xi_{m,l,x}}$ . Consider $x\in A$ and take any $m\geq m_{0}$ . Function $G_{N_{1}}$ is nondecreasing and convex according to Lemma 6. Due to the Jensen inequality

[TABLE]

Relation (4.44) guarantees that, for each $x\in A$ and all $m\geq m_{0}$ ,

[TABLE]

We have established uniform integrability of the family $\{\Delta_{m,l}(\cdot)\}_{m\geq m_{0}}$ (w.r.t. measure ${\sf P}_{X}$ ). Therefore, we conclude that

[TABLE]

It is easily seen that finiteness of integrals $Q_{p,q}(\varepsilon_{1},R_{1})$ , $T_{p,q}(\varepsilon_{2},R_{2})$ implies that

[TABLE]

This is verified as in Remark 4 by taking into account that $\log^{2}z\leq\frac{4}{\varepsilon^{2}}z^{\varepsilon}$ for all $z\geq 1$ and $\varepsilon>0$ . Thus, ${\sf E}\log^{2}\phi_{m,l}(1)\to\tau_{2}<\infty$ . Hence ${\sf var}\left(\log\phi_{m,l}(1)\right)={\sf E}\log^{2}\phi_{m,l}(1)-\left({\sf E}\log\phi_{m,l}(1)\right)^{2}\to\tau_{2}-\tau^{2}_{1}<\infty$ , $m\to\infty$ , where $\tau_{1}:=\psi(l)-\log{V_{d}}-\int_{\mathbb{R}^{d}}p(x)\log{q(x)}\,dx$ according to (3.2). Consequently, $\frac{1}{n}{\sf var}\left(\log\phi_{m,l}(1)\right)\to 0$ as $n,m\to\infty$ .

Step 4. Now we consider ${\sf cov}(\log\phi_{m,l}(i),\log\phi_{m,l}(j))$ for $i\neq j$ , where $i,j\in\{1,\ldots,n\}$ . For $x,y\in\mathbb{R}^{d}$ , introduce conditional distribution function

[TABLE]

For $x,y\in\mathbb{R}^{d}$ , $u,w\geq 0$ , $i\neq j$ ,

[TABLE]

Here $r_{m}(a)=\left(\frac{a}{m}\right)^{\frac{1}{d}}$ for all $a\geq 0$ , as previously. One can write $\Phi_{m,l,x,y}(u,w)$ instead of ${\Phi}^{i,j}_{m,l,x,y}(u,w)$ , because the right-hand side of (4.50) does not depend on $i$ and $j$ .

Set $A_{1}:=\big{\{}(x,y):x\in A,\,y\in A,\,x\neq y\big{\}}$ and $A_{2}:=\big{\{}(x,y):x\in A,\,y\in A,\,x=y\big{\}}$ , where $A$ is introduced in (4.6). Evidently, $\left({\sf P}_{X}\otimes{\sf P}_{X}\right)(A_{1})=1$ and $\left({\sf P}_{X}\otimes{\sf P}_{X}\right)(A_{2})=0$ .

Consider $(x,y)\in A_{1}$ . Obviously, for any $a>0$ , $r_{m}(a)\to 0$ , as $m\to\infty$ . For $(x,y)\in A_{1}$ we take $m_{5}=m_{5}(u,w,\left\lVert x-y\right\rVert):=\left\lceil\left(\frac{2}{\left\lVert x-y\right\rVert}\right)^{d}\max\left\{u,w\right\}\right\rceil$ . Then $r_{m}(u)<\frac{\left\lVert x-y\right\rVert}{2}$ and $r_{m}(w)<\frac{\left\lVert x-y\right\rVert}{2}$ for all $m\geq m_{5}$ . Thus $B(x,r_{m}(u))\cap B(y,r_{m}(w))=\varnothing$ if $m\geq m_{5}$ . Consequently, for $m\geq m_{6}(u,w,\left\lVert x-y\right\rVert):=\max\big{\{}m_{5},2(l-1)\big{\}}$ ,

[TABLE]

In view of (3.10), (4.50) and (4.53), one has for $\Phi_{m,l,x,y}(u,w)$ the following representation

[TABLE]

For any fixed $(x,y)\in A_{1}$ and $u,w>0$ ,

[TABLE]

Then, according to (4.56), (3.14) and (4.59), for all fixed $u,w>0$ , $(x,y)\in A_{1}$ , one has

[TABLE]

Thus $\Phi_{l,x,y}(\cdot,\cdot)$ is a distribution function of a vector $\eta_{l,x,y}:=(\xi_{l,x},\xi_{l,y})$ , where $\xi_{l,x}\sim\Gamma(V_{d}q(x),l)$ , $\xi_{l,y}\sim\Gamma(V_{d}q(y),l)$ and the components of $\eta_{l,x,y}$ are independent. Observe also that $\Phi_{m,l,x,y}(\cdot,\cdot)$ is a distribution function of a random vector $\eta_{m,l,x,y}:=(\xi_{m,l,x},\xi_{m,l,y})$ .

Consequently, we have shown that $\eta_{m,l,x,y}\stackrel{{\scriptstyle law}}{{\rightarrow}}\eta_{l,x,y}$ as $m\to\infty$ . Therefore, for any $(x,y)\in A_{1}$ ,

[TABLE]

Here we exclude a set of zero probability where random variables under consideration can be equal to zero. Note that, for all $i,j\in\mathbb{N}$ , $i\neq j$ ,

[TABLE]

Obviously, in view of (3.20) and since $\xi_{l,x}$ and $\xi_{l,y}$ are independent, one has

[TABLE]

Now we intend to verify that, for any $(x,y)\in A_{1}$ ,

[TABLE]

Equivalently, one can prove that, for each $(x,y)\in A_{1}$ , ${\sf E}(\log\xi_{m,l,x}\log\xi_{m,l,y})\to{\sf E}(\log{\xi_{l,x}}\log{\xi_{l,y}})$ , $m\to\infty$ .

Part (4a). We establish the uniform integrability of a family $\{\log\xi_{m,l,x}\log\xi_{m,l,y}\}_{m\geq m_{0}}$ for $(x,y)\in A_{1}$ . The function $G_{N_{1}}(\cdot)$ is nondecreasing and convex. Thus, for any $(x,y)\in A_{1}$ , following the proof of Step 2, one can find $m_{0}$ (the same as in the proof of Step 2) such that, for all $m\geq m_{0}$ ,

[TABLE]

Clearly, $U_{1},U_{2},\kappa,A,B$ do not depend on $x$ or $y$ by virtue of (4.43). Hence, for any $(x,y)\in A_{1}$ , a family $\{\log\xi_{m,l,x}\log\xi_{m,l,y}\}_{m\geq m_{0}}$ is uniformly integrable. Therefore we come to (4.65) for $(x,y)\in A_{1}$ .

Part (4b). Set $T_{m,l}(x,y):={\sf E}\big{(}\log\phi_{m,l}(1)\log\phi_{m,l}(2)|X_{1}=x,X_{2}=y\big{)}$ $={\sf E}(\log\xi_{m,l,x}\,\log\xi_{m,l,y})$ , where $(x,y)\in A_{1}$ . Then (4.65) means that $T_{m,l}(x,y)\to(\psi{(l)}-\log{V_{d}}-\log{q(x)})(\psi{(l)}-\log{V_{d}}-\log{q(y)})$ for any $(x,y)\in A_{1}$ , as $m\to\infty$ . Note that

[TABLE]

Due to (4.69) and (4.72) one can conclude that, for all $m\geq m_{0}$ , as $\left({\sf P}_{X}\otimes{\sf P}_{X}\right)(A_{1})=1$ ,

[TABLE]

Hence, for $(x,y)\in A_{1}$ , a family $\big{\{}T_{m,l}(x,y)\big{\}}_{m\geq m_{0}}$ is uniformly integrable w.r.t. ${\sf P}_{X}\otimes{\sf P}_{X}$ . Consequently,

[TABLE]

Thus

[TABLE]

On the other hand, taking also into account (3.2), we come to the relation

[TABLE]

Therefore (4.76) and (4.77) imply that

[TABLE]

.

Step 5. Now we consider ${\sf cov}(\log\zeta_{n,k}(i),\log\zeta_{n,k}(j))$ for $i\neq j$ , where $i,j\in\{1,\ldots,n\}$ . Similar to Step 4, for $x,y\in\mathbb{R}^{d}$ and $u,w>0$ , introduce a conditional distribution function

[TABLE]

where $\widetilde{\eta}_{n,k,x}^{\,y,i,j}:=(n-1)\left\lVert x-X_{(k)}(x,\{X_{s}\}_{s\neq i,j}\cup\{y\})\right\rVert^{d}$ . We write further $\widetilde{\Phi}_{n,k,x,y}(u,w)$ , $\widetilde{\eta}_{n,k,x}^{\,y}$ and $\widetilde{\eta}_{n,k,y}^{\,x}$ instead of $\widetilde{\Phi}^{i,j}_{n,k,x,y}(u,w)$ , $\widetilde{\eta}_{n,k,x}^{\,y,i,j}$ , $\widetilde{\eta}_{n,k,y}^{\,x,i,j}$ , respectively (since $X_{1},X_{2},\ldots$ are i.i.d. random vectors). Moreover, $\widetilde{\Phi}_{n,k,x,y}(u,w)$ is the distribution function of a random vector $\widetilde{\eta}_{n,k,x,y}:=(\widetilde{\eta}_{n,k,x}^{\,y},\widetilde{\eta}_{n,k,y}^{\,x})$ and the regular conditional distribution function of a random vector $(\zeta_{n,k}(i),\zeta_{n,k}(j))$ given $(X_{i},X_{j})=(x,y)$ . One has

[TABLE]

Introduce

[TABLE]

where the first three sets appeared in Theorem 1 proof (Step 5), and $\widetilde{A}_{p,2}(G_{N})$ , for $N\in\mathbb{N}$ and a probability density $p$ on $\mathbb{R}^{d}$ , is defined in full similarity to $\widetilde{A}_{p}(G_{N})$ . Namely, introduce

[TABLE]

and set $\widetilde{A}_{p,2}(G_{N}):=\{x\in S(p):\widetilde{R}_{N,2}(x)<\infty\}$ . Then ${\sf P}_{X}(S(p)\setminus\widetilde{A}_{p,2}(G_{N_{2}}))=0$ since $K_{p,p}(2,N_{2})<\infty$ . It is easily seen that ${\sf P}_{X}(\widetilde{A})=1$ .

Consider $\widetilde{A}_{1}:=\big{\{}(x,y):x\in\widetilde{A},\,y\in\widetilde{A},\,x\neq y\big{\}}$ and $\widetilde{A}_{2}:=\big{\{}(x,y):x\in\widetilde{A},\,y\in\widetilde{A},\,x=y\big{\}}$ . Evidently, $\left({\sf P}_{X}\otimes{\sf P}_{X}\right)(\widetilde{A}_{1})=1$ and $\left({\sf P}_{X}\otimes{\sf P}_{X}\right)(\widetilde{A}_{2})=0$ . For any $a>0$ , $r_{m}(a)\to 0$ , as $m\to\infty$ . Hence, for $(x,y)\in\widetilde{A}_{1}$ , one can find $\widetilde{n}_{5}=\widetilde{n}_{5}(u,w,\left\lVert x-y\right\rVert)=1+\left\lceil\left(\frac{2}{\left\lVert x-y\right\rVert}\right)^{d}\max\left\{u,w\right\}\right\rceil$ such that $r_{n-1}(u)<\frac{\left\lVert x-y\right\rVert}{2}$ , $r_{n-1}(w)<\frac{\left\lVert x-y\right\rVert}{2}$ if $n\geq\widetilde{n}_{5}$ . Then $B(x,r_{n-1}(u))\cap B(y,r_{n-1}(w))=\varnothing$ if $n\geq\widetilde{n}_{5}(u,w,\left\lVert x-y\right\rVert)$ . Thus, for $n\geq\widetilde{n}_{6}:=\max\big{\{}\widetilde{n}_{5},2k\big{\}}$ , one has

[TABLE]

Therefore, for each fixed $(x,y)\in\widetilde{A}_{1}$ , $u,w>0$ , we get, as $n\to\infty$ ,

[TABLE]

Here $\widetilde{\Phi}_{k,x,y}(\cdot,\cdot)$ is the distribution function of a vector $\widetilde{\eta}_{k,x,y}:=(\widetilde{\xi}_{k,x},\widetilde{\xi}_{k,y})$ , where $\widetilde{\xi}_{k,x}\sim\Gamma(V_{d}\,p(x),k)$ , $\widetilde{\xi}_{k,y}\sim\Gamma(V_{d}\,p(y),k)$ and the components of $\widetilde{\eta}_{k,x,y}$ are independent.

Consequently, we have shown that $\widetilde{\eta}_{n,k,x,y}\stackrel{{\scriptstyle law}}{{\rightarrow}}\widetilde{\eta}_{k,x,y}$ as $n\to\infty$ . Therefore, for any $(x,y)\in\widetilde{A}_{1}$ ,

[TABLE]

Here we exclude a set of zero probability where random variables under consideration can be equal to zero. In a similar way to (4.62), for $i,j\in\{1,\ldots,n\}$ , $i\neq j$ , we write

[TABLE]

Since $\widetilde{\xi}_{k,x}$ and $\widetilde{\xi}_{k,y}$ are independent, formula (3.20) yields

[TABLE]

For any fixed $M>0$ , consider $\widetilde{A}_{1,M}:=\big{\{}(x,y)\in\widetilde{A}_{1}:\left\lVert x-y\right\rVert>M\big{\}}$ . Now our aim is to verify that, for each $(x,y)\in\widetilde{A}_{1,M}$ ,

[TABLE]

Equivalently, we can prove, for each $(x,y)\in\widetilde{A}_{1,M}$ , that

[TABLE]

The idea that we consider only $(x,y)\in\widetilde{A}_{1,M}$ is principle for the further proof.

Part (5a). We will establish the uniform integrability of a family $\{\log\widetilde{\eta}_{n,k,x}^{\,y}\log\widetilde{\eta}_{n,k,y}^{\,x}\}_{n\geq\widetilde{n}_{0}}$ for $(x,y)\in\widetilde{A}_{1,M}$ and some $\widetilde{n}_{0}\in\mathbb{N}$ which does not depend on $x,y$ , but can depend on $M$ . Then, due to (4.84), the relation (4.91) would be valid for such $(x,y)$ as well.

As we have seen, the function $G_{N_{2}}(\cdot)$ is nondecreasing and convex. Hence

[TABLE]

Let us consider, for instance, ${\sf E}G_{N_{2}}(\log^{2}\widetilde{\eta}_{n,k,x}^{\,y})$ . As at Step 2 we can write

[TABLE]

where

[TABLE]

As usual a sum over empty set is equal to [math] (for $k=1$ ).

If $u\in\left(0,\frac{1}{\rho{(N_{2})}}\right]$ , where $\rho(N):=\exp\{\sqrt{e_{[N-1]}}\}$ and $n\geq\widetilde{n}_{1}:=\left\lceil\frac{1}{\rho{(N_{2})}M^{d}}\right\rceil+1$ , then $r_{n-1}(u)\leq M$ . Thus $r_{n-1}(u)<\left\lVert x-y\right\rVert$ . In view of (4.97), $\widetilde{F}_{n,k,x}^{y}(u)=1-\sum_{s=0}^{k-1}\binom{n-2}{s}\big{(}V_{n-1,x}(u)\big{)}^{s}\\ (1-V_{n-1,x}(u))^{n-2-s}$ . Similarly to (3.27), one has

[TABLE]

for all $(x,y)\in\widetilde{A}_{1,M}$ , $u\in\left(0,\frac{1}{\rho(N_{2})}\right]$ , $n\geq\max\{\widetilde{n}_{1}(M),\widetilde{n}_{2}(R_{3})\}$ , where $\widetilde{n}_{2}(R_{3}):=\max\big{\{}\left\lceil\frac{1}{\rho(N_{2})R_{3}^{d}}\right\rceil+1,k+1\big{\}}$ . Consequently, $I_{1}(n,x,y)\leq U_{1}(\varepsilon_{3},N_{2},d)\left(M_{p}(x,R_{3})\right)^{\varepsilon_{3}}$ for all $(x,y)\in\widetilde{A}_{1,M}$ and $n\geq\max\left\{\widetilde{n}_{1}(M),\widetilde{n}_{2}(R_{3})\right\}$ . Moreover, for all $u>0$ , in view of (4.97) it holds

[TABLE]

The same reasoning as was used in Theorem 1 proof (Step 3, Part (3b)) leads to the inequalities

[TABLE]

for all $n\geq\max\left\{\widetilde{n}_{3}(R_{4}),3\right\}$ . Then similarly to (4.40), the relation

[TABLE]

is valid for all $(x,y)\in\widetilde{A}_{1,M}$ and $n\geq\widetilde{n}_{0}(M):=\max\left\{\widetilde{n}_{1},\widetilde{n}_{2},\widetilde{n}_{3},\widetilde{n}_{4}(\kappa),3\right\}$ . Here $U_{1},\widetilde{U}_{2},\kappa,A,B$ do not depend on $x$ or $y$ . Thus, in view of (4.93), one has

[TABLE]

Hence, for any $(x,y)\in\widetilde{A}_{1,M}$ , a family $\{\log\widetilde{\eta}_{n,k,x}^{\,y}\log\widetilde{\eta}_{n,k,y}^{\,x}\}_{n\geq\widetilde{n}_{0}}$ is uniformly integrable. Thus we come to (4.90) for $(x,y)\in\widetilde{A}_{1,M}$ .

Part (5b). Set $\widetilde{T}_{n,k}(x,y):={\sf E}\big{(}\log\zeta_{n,k}(1)\log\zeta_{n,k}(2)|X_{1}=x,X_{2}=y\big{)}$ $={\sf E}\log\widetilde{\eta}_{n,k,x}^{\,y}\,\log\widetilde{\eta}_{n,k,y}^{\,x}$ for all $(x,y)\in\widetilde{A}_{1}$ . Relation (4.90) validity is equivalent to the following one: for any $(x,y)\in\widetilde{A}_{1,M}$ , $\widetilde{T}_{n,k}(x,y)\to(\psi{(k)}-\log{V_{d}}-\log{p(x)})(\psi{(k)}-\log{V_{d}}-\log{p(y)})$ , as $n\to\infty$ . Now take any $(x,y)\in\widetilde{A}_{1}$ . Then, for any fixed $M>0$ and $(x,y)\in\widetilde{A}_{1}$ , we have proved that

[TABLE]

Note that

[TABLE]

Due to (4.105) and (4.111) one can conclude that, for all $n\geq\widetilde{n}_{0}$ ,

[TABLE]

Hence, for $(x,y)\in\widetilde{A}_{1}$ , a family $\big{\{}\widetilde{T}_{n,k}(x,y){\mathbb{I}}\{\left\lVert x-y\right\rVert>M\}\big{\}}_{n\geq\widetilde{n}_{0}}$ is uniformly integrable w.r.t. ${\sf P}_{X}\otimes{\sf P}_{X}$ . Consequently, in view of (4.90), for each $M>0$ ,

[TABLE]

Now we consider the case $\left\lVert x-y\right\rVert\leq M$ . One has $\bigcap_{s=1}^{\infty}\left\{\left\lVert X_{1}-X_{2}\right\rVert\leq\frac{1}{s}\right\}=\left\{X_{1}=X_{2}\right\}$ and ${\sf P}\left(X_{1}=X_{2}\right)=0$ as $X_{1}$ and $X_{2}$ are independent and have a density $p(x)$ w.r.t. the Lebesgue measure $\mu$ . Then

[TABLE]

Taking into account that, for an integrable function $h$ , $\int_{C}hd{\sf P}\to 0$ as ${\sf P}(C)\to 0$ , we get

[TABLE]

since ${\sf E}\log\zeta_{n,k}(1)\log\zeta_{n,k}(2)\leq\frac{1}{2}\left({\sf E}\log^{2}\zeta_{n,k}(1)+{\sf E}\log^{2}\zeta_{n,k}(2)\right)<\infty$ (the proof is similar to the establishing that ${\sf E}\log\phi_{m,l}(1)<\infty$ ). Hence, for any $\gamma>0$ , one can find $M_{1}=M_{1}(\gamma)>0$ such that, for all $M\in(0,M_{1}]$ and $n\geq\widetilde{n}_{0}$ ,

[TABLE]

Set $v(t):=\psi{(k)}-\log{V_{d}}-\log{p(t)}$ , $t\in\mathbb{R}^{d}$ . Also there exists $M_{2}=M_{2}(\gamma)>0$ such that, for all $M\in(0,M_{2}]$ ,

[TABLE]

Take $M=\min\{M_{1},M_{2}\}$ . Due to (4.114) one can find $\widetilde{n}_{7}(M,\gamma)$ such that for all $n\geq\max\{\widetilde{n}_{0},\widetilde{n}_{7}(M,\gamma)\}$ the following inequality holds

[TABLE]

So, for any $\gamma>0$ , there is $M(\gamma)>0$ such that, for all $n\geq\max\{\widetilde{n}_{0},\widetilde{n}_{7}(M,\gamma)\}$ , one has

[TABLE]

By virtue of the formula

[TABLE]

and taking into account (4.116) we come to the relation

[TABLE]

Moreover, in view of (3.3) (see Step 5 of Theorem 1 proof), we have

[TABLE]

Therefore

[TABLE]

Step 6. Reasoning as at Steps 1-3 shows that $\frac{1}{n}{\sf var}\left(\log{\zeta_{n,k}(1)}\right)\to 0$ , $n\to\infty$ . To prove that

[TABLE]

we write, for $i,j=1,\ldots,n$ , $u,w>0$ , $x,y\in\mathbb{R}^{d},\,x\neq y$ , $\left\lVert x-y\right\rVert>r_{n-1}(w)$ (thus $n>\frac{w}{\left\lVert x-y\right\rVert^{d}}+1$ ) and $m\in\mathbb{N}$ ,

[TABLE]

Further we combine the estimates obtained at Steps 4 and 5 of Theorem 2 proof. Note that now we consider $(x,y)\in A_{1}\cap\widetilde{A}_{1}$ and employ $G_{\max\{N_{1},N_{2}\}}(\cdot)$ .

Thus we have established that ${\sf var}\big{(}\widehat{D}_{n,m}(k,l)\big{)}\to 0$ as $n,m\to\infty$ , hence (2.10) holds. The proof is complete. $\square$

Appendix A Proofs of auxiliary results

Proofs of Lemmas 1, 2 and 3 are similar to the proofs of Lemma 2.5 and 3.1, 3.2 in [7]. We provide them for the sake of completeness.

Proof of Lemma 1.

Note that $\log\|x-y\|>e_{[N-1]}\geq 1$ if $\|x-y\|>e_{[N]}$ and $N\in\mathbb{N}$ . Hence, for such $x,y$ , one has $(\log\|x-y\|)^{\nu}\leq(\log\|x-y\|)^{\nu_{0}}$ if $\nu\in(0,\nu_{0}]$ . If $N\geq N_{0}$ then $G_{N}(u)\leq G_{N_{0}}(u)$ for $u\geq e_{[N-1]}\geq e_{[N_{0}-1]}$ . Thus $K_{p,q}(\nu,N)\leq K_{p,q}(\nu_{0},N_{0})<\infty$ for $\nu\in(0,\nu_{0}]$ and any integer $N\geq N_{0}$ .
Assume that $Q_{p,q}(\varepsilon_{1},R_{1})<\infty$ . Consider $Q_{p,q}(\varepsilon_{1},R)$ where $R>0$ . If $0<R\leq R_{1}$ then, for each $x\in\mathbb{R}^{d}$ , according to the definition of $M_{q}$ one has $M_{q}(x,R)\leq M_{q}(x,R_{1})$ . Consequently, $Q_{p,q}(\varepsilon_{1},R)\leq Q_{p,q}(\varepsilon_{1},R_{1})<\infty$ . Let now $R>R_{1}$ . One has

[TABLE]

Therefore

[TABLE]

Suppose now that $Q_{p,q}(\varepsilon_{1},R)<\infty$ for some $\varepsilon_{1}>0$ and $R>0$ . Then, for any $\varepsilon\in(0,\varepsilon_{1}]$ , the Lyapunov inequality yields $Q_{p,q}(\varepsilon,R)\leq(Q_{p,q}(\varepsilon_{1},R))^{\frac{\varepsilon}{\varepsilon_{1}}}<\infty$ .

Let $T_{p,q}(\varepsilon_{2},R_{2})<\infty$ . Take $0<R\leq R_{2}$ . Then, for each $x\in\mathbb{R}^{d}$ , according to the definition of $m_{q}$ we get $0\leq m_{q}(x,R_{2})\leq m_{q}(x,R)$ . Hence $T_{p,q}(\varepsilon_{2},R)\leq T_{p,q}(\varepsilon_{2},R_{2})<\infty$ . Consider $R>R_{2}$ . For each $x\in\mathbb{R}^{d}$ and every $a>0$ , the function $I_{q}(x,r)$ is continuous in $r$ on $(0,a]$ . Consider an arbitrary (fixed) $x\in S(q)\cap\Lambda(q)$ . Then there exists $\lim_{r\to 0+}I_{q}(x,r)=q(x)$ . For such $x$ , set $I_{q}(x,0):=q(x)$ . Thus $I_{q}(x,\cdot)$ is continuous on any segment $[0,a]$ . Hence, one can find $\widetilde{R}_{2}$ in $[0,R_{2}]$ such that $m_{q}(x,R_{2})=I_{q}(x,\widetilde{R}_{2})$ and there exists $R_{0}$ in $[0,R]$ such that $m_{q}(x,R)=I_{q}(x,R_{0})$ . If $R_{0}\leq R_{2}$ then $m_{q}(x,R)=m_{q}(x,R_{2})$ (since $m_{q}(x,R)\leq m_{q}(x,R_{2})$ for $R>R_{2}$ and $m_{q}(x,R)=I_{q}(x,R_{0})\geq m_{q}(x,R_{2})$ as $R_{0}\in[0,R_{2}]$ ). Assume that $R_{0}\in(R_{2},R]$ . Obviously $R_{0}>0$ as $R_{2}>0$ . One has

[TABLE]

Thus in all cases ( $R_{0}\in[0,R_{2}]$ and $R_{0}\in(R_{2},R]$ ) one has $m_{q}(x,R)\geq\left(\frac{R_{2}}{R}\right)^{d}m_{q}(x,R_{2})$ as $R_{2}<R$ . Taking into account the relation $\mu(S(q)\setminus(S(q)\cap\Lambda(q)))=0$ we come to the inequality

[TABLE]

Assume now that $T_{p,q}(\varepsilon_{2},R)<\infty$ for some $\varepsilon_{2}>0$ and $R>0$ . Then, for any $\varepsilon\in(0,\varepsilon_{2}]$ , the Lyapunov inequality yields $T_{p,q}(\varepsilon,R)\leq(T_{p,q}(\varepsilon_{2},R))^{\frac{\varepsilon}{\varepsilon_{2}}}<\infty$ . The proof is complete. $\square$

Proof of Lemma 2. We start with relation 1). Note that if a function $g$ is measurable and bounded on a finite interval $(a,b]$ and $\nu$ is a finite measure on the Borel subsets of $(a,b]$ then $\int_{(a,b]}g(x)\nu(dx)$ is finite. Thus, for each $a\in\left(0,\frac{1}{e_{[N]}}\right]$ , using the integration by parts formula (see, e.g., [36], p. 245) we get

[TABLE]

Assume now that $\int_{\left(0,\frac{1}{e_{[N]}}\right]}G_{N}(-\log u)\,dF(u)<\infty$ . Then by the monotone convergence theorem

[TABLE]

Clearly, the following nonnegative integral admits an estimate

[TABLE]

Therefore (A.4) implies that

[TABLE]

Letting $a\to 0+$ in (A.3) we come, by the monotone convergence theorem, to relation 1) of our Lemma. Suppose now that

[TABLE]

In view of (A.6) and the equality $\int_{\left(0,\frac{1}{e_{[N]}}\right]}F(u)\left(-g_{N}(u)\right)\,du=\int_{\left(0,\frac{1}{e_{[N]}}\right]}F(u)d\left(-G_{N}(-\log u)\right)$ by monotone convergence theorem we have $\lim_{b\to 0+}\int_{(0,b]}F(u)\,d(-G_{N}(-\log u))=0.$ For any $c\in(0,b)$ , we obtain the inequalities

[TABLE]

Let $c=b^{2}$ ( $b\leq\frac{1}{e_{[N]}}<1$ ). Then, for all positive $b$ small enough,

[TABLE]

Thus $\int_{(0,b]}F(u)d(-G_{N}(-\log u))\geq\frac{1}{2}F(b^{2})G_{N}(-\log(b^{2}))\geq 0$ . It follows that $F(b^{2})G_{N}(-\log b^{2})\to 0$ as $b\to 0$ . Hence we come to (A.5) taking $a=b^{2}$ . Then (A.3) yields relation 1).

If one of (nonnegative) integrals appearing in 1) is infinite and other one is finite we come to the contradiction. Hence 1) is established. In a similar way one can prove that relation 2) is valid. Therefore, we omit further details. $\square$

Proof of Lemma 3. Take $x\in S(q)\cap\Lambda(q)$ and $R>0$ . Assume that $m_{q}(x,R)=0$ . Since the function $I_{q}(x,r)$ defined in (2.1) is continuous in $(x,r)\in\mathbb{R}^{d}\times(0,\infty)$ , there exists $\widetilde{R}\in[0,R]$ ( $\widetilde{R}=\widetilde{R}(x,R)$ ) such that $m_{q}(x,R)=I_{q}(x,\widetilde{R})$ (recall that $I_{q}(x,0):=\lim_{r\rightarrow 0+}I_{q}(x,r)=q(x)$ for all $x\in\Lambda(q)$ by continuity). If $\widetilde{R}=0$ then $m_{q}(x,r)=q(x)>0$ as $x\in S(q)\cap\Lambda(q)$ . Hence we have to consider $\widetilde{R}\in(0,R]$ . If $I_{q}(x,\widetilde{R})=0$ then $\int_{B(x,r)}q(y)dy=0$ for any $0<r\leq\widetilde{R}$ . Thus (3.12) ensures that $q(x)=0$ . However, $x\in S(q)\cap\Lambda(q)$ . So $m_{q}(x,R)>0$ for $x\in S(q)\cap\Lambda(q)$ . Thus, $S(q)\cap\Lambda(q)\subset D_{q}(R):=\{x\in S(q):m_{q}(x,R)>0\}$ . It remains to note that $S(q)\setminus\Lambda(q)\subset\mathbb{R}^{d}\setminus\Lambda(q)$ and $\mu(\mathbb{R}^{d}\setminus\Lambda(q))=0$ . Therefore $\mu(S(q)\setminus D_{q}(R))=0$ . $\square$

Proof of Lemma 4. We verify that, for given $N\in\mathbb{N}$ and $\tau>0$ , there exist $a:=a(\tau)\geq 0$ and $b:=b(N,\tau)\geq 0$ such that, for any $c\geq 0$ ,

[TABLE]

For $c=0$ the statement is obviously true. Let $c>0$ . One can easily see that $\frac{\log_{[N]}(\tau c)}{\log_{[N]}(c)}\to 1$ as $c\to\infty$ . Hence one can find $c_{0}(N,\tau)$ such that, for all $c\geq c_{0}(N,\tau)$ , the inequality $\frac{\log_{[N]}(\tau c)}{\log_{[N]}(c)}\leq 2$ is valid. Consequently, for $c\geq c_{0}(N,\tau)$ ,

[TABLE]

For all $0\leq c\leq c_{0}(N,\tau)$ we write $G_{N}(\tau c)\leq G_{N}(\tau c_{0}(N,\tau)):=b(N,\tau)$ . Therefore, for any $c\geq 0$ , we come to (A.7). Thus, for any $\nu>0$ and $x,y\in\mathbb{R}^{d}$ , $x\neq y$ , one has

[TABLE]

Proof of Lemma 6. For $t\in[0,e_{[N-1]}]$ , a function $G_{N}(t)\equiv 0$ is convex. We show that $G_{N}$ is convex on $(e_{[N-1]},\infty)$ . Consider $t>e_{[N-1]}$ . Write $\prod\limits_{\varnothing}:=1$ and $\sum\limits_{\varnothing}:=0$ . Then, for $N\in\mathbb{N}$ ,

[TABLE]

Obviously, $\left(\frac{1}{\log_{[k]}(t)}\right)^{\prime}=-\frac{1}{t\log_{[k]}^{2}(t)}\prod_{s=1}^{k-1}\frac{1}{\log_{[s]}(t)}$ , $k\in\mathbb{N}$ . Thus, for $t>e_{[N-1]}$ , we get

[TABLE]

For $N=1$ and $t>0$ , we have $\left(G_{1}(t)\right)^{\prime\prime}=\frac{1}{t}>0$ . Take now $N>1$ . Clearly, for $t>e_{[N-1]}$ , one has $\frac{1}{t}\prod\limits_{j=1}^{N-1}\frac{1}{\log_{[j]}(t)}>0$ because $\log_{[j]}(t)>\log_{[j]}(e_{[N-1]})=e_{[N-1-j]}\geq 1>0$ when $1\leq j\leq N-1$ . Observe also that

[TABLE]

The last inequality is established by induction in $N$ . Thus, in view of (A.8), we have proved that, for all $t>e_{[N-1]}$ and $N\in\mathbb{N}$ , the inequality $(G_{N}(t))^{\prime\prime}>0$ holds. Hence, the function $G_{N}(t)$ is (strictly) convex on $\left(e_{[N-1]},\infty\right)$ .

Let $h:[a,\infty)\to\mathbb{R}$ be a continuous nondecreasing function. If the restrictions of $h$ to $[a,b]$ and $(b,\infty)$ (where $a<b$ ) are convex functions then, in general, it is not true that $h$ is convex on $[a,\infty)$ . However, we can show that $G_{N}$ is convex on $[0,\infty)$ . Note that a function $G_{N}$ is convex on $[e_{[N-1]},\infty)$ since it is convex on $(e_{[N-1]},\infty)$ and continuous on $[e_{[N-1]},\infty)$ . Take now any $z\in[0,e_{[N-1]}]$ , $y\in(e_{[N-1]},\infty)$ and $s\in[0,1]$ . Then $G_{N}(sz+(1-s)y)\leq G_{N}(se_{[N-1]}+(1-s)y)\leq sG_{N}(e_{[N-1]})+(1-s)G_{N}(y)=(1-s)G_{N}(y)=sG_{N}(z)+(1-s)G_{N}(y)$ as $G_{N}(z)=0$ . Thus, for each $N\in\mathbb{N}$ , a function $G_{N}(\cdot)$ is convex on $\mathbb{R}_{+}$ . $\square$

Proof of Corollary 5. The proof (i.e. checking the conditions of both Theorem 1 and 2) is quite similar to the proof of Corollary 2.11 in [7].

Acknowledgements The authors are grateful to Professor A.Tsybakov for useful discussions. This work is supported by the Lomonosov Moscow State University under grant “Modern Problems of the Fundamental Mathematics and Mechanics”.

Bibliography45

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Alonso-Ruiz, P., Spodarev, E. (2016). Entropy-based inhomogeneity detection in fiber materials. Methodol. Comput. Appl. Probab. Published online: 27 November 2017, doi.org/10.1007/s 11009-017-9603-2.
2[2] Berrett, T.B., Samworth R.J. and Yuan M. (2019). Efficient multivariate entropy estimation via k-nearest neighbour distances. Ann. of Statist . 47 , 288–318.
3[3] Biau G. and Devroye L. (2015). Lectures on the Nearest Neighbor Method . Springer, Cham.
4[4] Billingsley, P. (1999). Convergence of Probability Measures , 2nd edn. John Wiley, New York.
5[5] Bishop, C.M. (2006) Pattern Recognition and Machine Learning . Springer, Singapore.
6[6] Borkar, V.S. (1995). Probability Theory. An Advanced Course . Springer, New York.
7[7] Bulinski, A., Dimitrov, D. (2019). Statistical estimation of the Shannon entropy. Acta Mathematica Sinica. English series . 35 , 17–46.
8[8] Bulinski, A. and Kozhevin, A. (2018). Statistical estimation of conditional Shannon entropy. ESAIM: Probability and Statistics . Published online: November 28, 1–35.