Direct Estimation of Information Divergence Using Nearest Neighbor Ratios
Morteza Noshad, Kevin R. Moon, Salimeh Yasaei Sekeh, Alfred O. Hero, III

TL;DR
This paper introduces a graph-based method for directly estimating Rényi and f-divergences from sample data, achieving optimal convergence rates and improved computational efficiency over existing techniques.
Contribution
The authors develop a novel graph-theoretic estimator for divergence measures that attains parametric convergence rates and is more computationally efficient than previous methods.
Findings
Estimator achieves MSE rate of O(N^{-2γ/(γ+d)}) for γ-Hölder smooth functions.
Ensemble estimator attains parametric MSE rate of O(1/N) under certain conditions.
Method is computationally more tractable than competing divergence estimators.
Abstract
We propose a direct estimation method for R\'{e}nyi and f-divergence measures based on a new graph theoretical interpretation. Suppose that we are given two sample sets and , respectively with and samples, where is a constant value. Considering the -nearest neighbor (-NN) graph of in the joint data set , we show that the average powered ratio of the number of points to the number of points among all -NN points is proportional to R\'{e}nyi divergence of and densities. A similar method can also be used to estimate f-divergence measures. We derive bias and variance rates, and show that for the class of -H\"{o}lder smooth functions, the estimator achieves the MSE rate of . Furthermore, by using a weighted ensemble estimation technique, for density functions with continuous and bounded…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Direct Estimation of Information Divergence Using Nearest Neighbor Ratios
Morteza Noshad [email protected] University of Michigan, Electrical Engineering and Computer Science, Ann Arbor, Michigan, U.S.A
Kevin R. Moon [email protected] Yale University, Genetics and Applied Math Departments, New Haven, Connecticut, U.S.A
Salimeh Yasaei Sekeh [email protected] University of Michigan, Electrical Engineering and Computer Science, Ann Arbor, Michigan, U.S.A
Alfred O. Hero III [email protected] University of Michigan, Electrical Engineering and Computer Science, Ann Arbor, Michigan, U.S.A
Abstract
We propose a direct estimation method for Rényi and f-divergence measures based on a new graph theoretical interpretation. Suppose that we are given two sample sets and , respectively with and samples, where is a constant value. Considering the -nearest neighbor (-NN) graph of in the joint data set , we show that the average powered ratio of the number of points to the number of points among all -NN points is proportional to Rényi divergence of and densities. A similar method can also be used to estimate f-divergence measures. We derive bias and variance rates, and show that for the class of -Hölder smooth functions, the estimator achieves the MSE rate of . Furthermore, by using a weighted ensemble estimation technique, for density functions with continuous and bounded derivatives of up to the order , and some extra conditions at the support set boundary, we derive an ensemble estimator that achieves the parametric MSE rate of . Our estimator requires no boundary correction, and remarkably, the boundary issues do not show up. Our approach is also more computationally tractable than other competing estimators, which makes them appealing in many practical applications.
11footnotetext: This research was partially supported by ARO grant W911NF-15-1-0479.
I Introduction
Shannon entropy, mutual information, and the Kullback-Leibler (KL) divergence are major information theoretic measures. Shannon entropy can measure diversity or uncertainty of samples, while KL-divergence is a measure of dissimilarity, and mutual information is a measure of dependency between two probability distributions [1]. Rényi proposed a divergence measure which generalizes KL-divergence [2]. F-divergence is another general family which is also well studied, and comprises many important divergence measures such as KL-divergence, total variation distance, and -divergence [3]. These measures have wide range of applications in information and coding theory, statistics and machine learning [1, 4, 5].
A major class of estimators for these measures is called non-parametric, for which minimal assumptions on the density functions are considered in contrast to parametric estimators. An approach used for this class is plug-in estimation, in which we find an estimate of a distribution function and then plug it in the measure function. -Nearest Neighbor (-NN) and Kernel Density Estimator (KDE) methods are examples of this approach. Another approach is direct estimation, in which we find a relationship between the measure function and a functional in Euclidean space. In a seminal work in 1959, Beardwood et al derived the asymptotic behavior of the weighted functional of minimal graphs such as -NN and TSP of i.i.d random points [6]. They showed that the sum of weighted edges of these graphs converges to the integral of a weighted density function, which can be interpreted as Rényi entropy. Since then, this work has been of great interest in signal processing and machine learning communities. More recent studies of direct graph theoretical approaches include the estimation of Rényi entropy using the minimal graphs [7], in which the authors investigate the convergence rates, as well as the estimation of Henze-Penrose divergence using MST graphs [8]. Yet the extension to Rényi divergence and f-divergences has remained an open question. Moreover, among various estimators of information measures, developing accurate and computationally tractable approaches has been often a challenge. Therefore, for practical and computational reasons, direct graphical algorithms have been under attention in the literature including this work.
In this work, we propose an estimation method for Rényi and f-divergences based on a direct graph estimation method. We show that given two sample sets and with respective densities of and , and the -nearest neighbor (-NN) graph of in the joint data set , the average powered ratio of the number of points to the number of points among all -NN points converges to the Rényi divergence. Using this fact, we design a consistent estimator for the Rényi and f-divergences.
Unlike most distance-based divergence estimators, our proposed estimator can use non-Euclidean metrics, which makes this estimator appealing in many information theoretic and machine learning applications. Our estimator requires no boundary correction, and surprisingly, the boundary issues do not show up. This is because the proposed estimator automatically cancels the extra bias of the boundary points in the ratio of nearest neighbor points. Our approach is more computationally tractable than other estimators, with a time complexity of , required to construct the -NN graph [9]. For example for we get the complexity of . We show that for the class of -Hölder smooth functions, the estimator achieves the MSE rate of . Furthermore, by using the theory of optimally weighted ensemble estimation [10, 5], for density functions with continuous and bounded derivatives of up to the order , and some extra conditions at the support set boundary, we derive an ensemble estimator that achieves the optimal MSE rate of , which is independent of the dimension. Finally, the current work is an important step towards extending the direct estimation method studied in [11, 12] to more general information theoretic measures.
Several previous works have investigated an estimator for a particular type of divergence measures. -NN [13], KDE [14], and histogram [15] estimators are among the studied plug-in estimators for the f-divergence family. In general, most of these estimators suffer from several restrictions such as lack of analytic convergence rates, or high computational complexity.
Recent works have focused on the MSE convergence rates for plug-in divergence estimators, such as KDE. Singh and Póczos proposed estimators for general density functionals and Rényi divergence, based on the kernel density plug-in estimator [14][16], which can achieve the convergence rate of when the densities are at least times differentiable. In a similar approach, Kandasamy et al proposed another KDE-based estimator for general density functionals and divergence measures, which can achieve the convergence rate of when the densities are at least differentiable [17].
Moon et al proposed simple kernel density plug-in estimators using weighted ensemble methods to improve the rate [10][18]. The proposed estimator can achieve the convergence rate when the densities are at least times differentiable. The main drawback of these estimators is handling the bias at the support set boundary. For example, using the estimators proposed in [14, 17] requires knowledge of the densities’ support set and numerous computations at the support boundary, which become complicated when the dimension increases. To circumvent this issue, Moon et al [10] assumed smoothness conditions at the support set boundary, which may not always be true in practice. In contrast, our basic estimator does not require any smoothness assumptions on the support set boundary although our ensemble estimator does. Regarding the algorithm time complexities, our estimator spends time versus the time complexity of KDE based estimators which spend time.
A rather different method for estimating f-divergences is suggested by Nguyen et al [19], which is based on a variational representation of f-divergences that connects the estimation problem to a convex risk minimization problem. This approach achieves the parametric rate of when the likelihood ratio is at least times differentiable. However, the algorithm’s time complexity is even worse than .
II A direct estimator of divergence measures
In this section, we first introduce the Rényi and f-divergence measures. Then we propose an estimator based on a graph theoretical interpretation, and we outline our main theoretical results, which will be proven in section III.
Consider two density functions and with support . The Rényi divergence between and is
[TABLE]
where in the second line, is defined as :
Another general divergence family, f-divergence, is also defined as follows [3].
[TABLE]
where is a smooth and convex function such that . KL-divergence, Hellinger distance and total variation distance are particular cases of this family. Note that for our approach, we only assume that is smooth.
We assume that the densities are lower bounded by and upper bounded by . Also and belong to Hölder smoothness class with parameter :
- Definition
Given a support , a function is called Hölder continuous with parameter , if there exists a positive constant , depending on , such that
[TABLE]
for every .
The function in (II) is also assumed to be Lipschitz continuous; i.e. is Hölder continuous with .
Remark 1
-Hölder smoothness family comprises a large class of continuous functions including continuously differentiable functions and Lipschitz continuous functions. Also note that for , any –Hölder continuous function on any bounded and continuous support is constant.
- Nearest Neighbor Ratio (NNR) Estimator:
Consider the i.i.d samples drawn from and drawn from . We define the set , and consider the -NN points for each of the points in the set , which is represented by . Let and be the number of points of the sets and among the NN points of , respectively. Then an estimator for Rényi divergence is
[TABLE]
where . Similarly, using the alternative form in (II), we have
[TABLE]
Note that the estimator defined in (4) can be negative and unstable in extreme cases. To correct this, we propose the NNR estimator for Rényi divergence denoted by :
[TABLE]
The NNR f-divergence estimator is defined as
[TABLE]
where .
The intuition behind the proposed estimators is that, the ratio can be considered an estimate of density ratios at . Note that if the densities and are almost equal, then for each point , , and therefore both and tend to zero. In the following theorems we derive upper bounds on the bias and variance rates. Consider the bias and variance definitions as and , respectively, where is an estimator of the parameter .
Theorem II.1
The bias of NNR estimator for Rényi divergence, defined in (6), can be bounded as
[TABLE]
Here is the Hölder smoothness parameter.
Theorem II.2
The variance of the NNR estimator is
[TABLE]
Remark 2
The same variance bound holds true for the RV . Also bias and variance results easily extend to the f-divergence estimator.
Remark 3
Note that in most cases, the term in (8) is the dominant error term, and in order to have an asymptotically unbiased NNR estimator, should be a growing function of . The term actually comes from the error of Poissonization technique used in the proof. By equating the terms and , it turns out that for , we get the optimal MSE rate of . The optimal choice for can be compared to the optimum value in [4], where a plug-in KNN estimator is used. Also considering the computational complexity of to construct the -NN graph [9], we see that there is a trade-off between MSE rate and complexity for different values of . In the particular case of optimal MSE, the computational complexity of this method is .
Under extra conditions on the densities and support set boundary, we can improve the bias rate by applying the ensemble theory in [10, 5]. Assume that the density functions are in the Hölder space , which consists of functions on continuous derivatives up to order and the th partial derivatives are Hölder continuous with exponent . We also assume that the density derivatives up to order vanish at the boundary. Let be a set of index values with . Let . The weighted ensemble estimator is defined as , where is the NNR estimator of Rényi -divergence, using the -NN graph.
Theorem II.3
Let and be the solution to:
[TABLE]
Then the MSE rate of the ensemble estimator is .
III Proof
In this section we derive the bias terms of NNR estimator. The variance bound for NNR estimator is more straightforward and can be derived using Efron-Stein inequality. Also for proving the MSE rate of ensemble variant of the NNR estimator, we need more accurate bias rates, which is provided in the arXiv version. So, for variance and ensemble estimation proofs we refer the reader to the Appendix section of arXiv version of the paper. First, we provide a smoothness lemma for the densities. Unless stated otherwise, all proofs of lemmas are provided in the arXiv version.
Lemma III.1
Suppose that the density function belongs to the -Hölder smoothness class. Then if denotes the sphere with center and radius , where is defined as the -NN distance on the point , we have the following smoothness condition:
[TABLE]
where , and we have for a fixed .
We first state the bias proof for Rényi divergence, and then we extend the method to f-divergence. It is easier to work with defined in (5), instead of . The following lemma provides the essential tool to make a relation between and .
Lemma III.2
Assume that is Lipschitz continuous with constant . If is a RV estimating a constant value with the bias and the variance , then the bias of can be upper bounded by
[TABLE]
An immediate consequence of this lemma is
[TABLE]
where is a constant.
From theorem II.2, , so we only need to bound . If , we have:
[TABLE]
Now note that and are not independent since . We use the Poissonizing technique [20][21] and assume that , where is a Poisson random variable with mean . We represent the Poissonized variant of by , and we will show that . By partitioning theorem for a Poisson random variable with Bernoulli trials of probabilities and , we argue that and are two independent Poisson RVs. We first compute and as follows:
Lemma III.3
Let . The probability that the point respectively belongs to the sets and is equal to
[TABLE]
Using the conditional independence of and we write
[TABLE]
can be simplified as
[TABLE]
Also similarly,
[TABLE]
Lemma III.4
If is a Poisson random variable with the mean , then
[TABLE]
Using this lemma for yields
[TABLE]
here is some positive constant. Therefore, (16) becomes
[TABLE]
Using lemma III.2 and theorem II.2, we obtain
[TABLE]
By applying an equation similar to (III), we get
[TABLE]
Lemma III.5
De-Poissonizing adds error:
[TABLE]
At this point the bias proof of NNR estimator for Rényi divergence is complete, and since and are of higher order compared to , we obtain the final bias rate in (8). The bias proof of NNR estimator for f-divergence is similar, and by using the lemma III.2 for , we can follow the same steps to prove the bias bound. The complete proof is provided in the arXiv version.
IV numerical Results
In this section we provide numerical results to show the consistency of the proposed estimator and compare the estimation quality in terms of different parameters such as and . In our experiments, we choose i.i.d samples for and from different independent distributions such as Gaussian, truncated Gaussian and uniform functions.
The first experiment, shown in Figure 1, shows the mean estimated KL-divergence as N grows for equal to . The divergence measure is between a 2D Gaussian RV with mean and variance of , and a uniform distribution with . For each case we repeat the experiment times, and compute the mean of the estimated value and the standard deviation error bars. For small sample sizes, smaller results in smaller bias error, which is due to the bias term. As grows, we get larger bias for small values of , which is due to the fact that the term dominates. If we compare the standard deviations for different values of at , they are almost equal, which verifies the fact that variance is independent of .
Figure 2 shows the MSE of NNR estimator of Renyi divergence with for two independent, truncated normal RVs. The RVs are 2D with means and covariance matrices and , where is a diagonal matrix of size . Both of the RVs are truncated with the range and . In this figure we show the MSE for three different sample sizes of , and for different values of . As increases initially, MSE decreases due to the bias term. After reaching an optimal point, MSE increases as increases, indicating that the other bias terms begin to dominate. The optimal increases with the sample size which validates our theory.
Figure 3 shows the MSE of the NNR estimator of Rényi divergence with versus , for two i.i.d. Normal RVs for three different dimension sizes: , and . is fixed so that the term in the bias can be ignored relative to the term. As dimension grows, the MSE decreases almost linearly in the logarithmic scale, which verifies the bias term.
Finally in Figure 4, we compare our estimator with two standard plug-in estimators, -NN, KDE. For each of these estimators we estimate the density at each , and then compute the relation for the divergence measure using the definition in (II). The graph shows the MSE for Rényi divergence () between two Gaussian random variables with the same mean and different variances () as a function of sample size, . For both the NNR and -NN estimators we use the optimal value for and the optimal bandwidth for the KDE estimator. According to this figure, the NNR estimator outperforms the other methods.
V Conclusion
In this paper we proposed a direct estimation method for Rényi and f-divergence measures based on a new graph theoretical interpretation. We proved bias and variance convergence rates, and validated our results by numerical experiments. Direct estimation procedures that converge for a fixed number of nearest neighbors is a worthwhile topic for future work.
A. Bias Proof
In this section we give proofs for the Lemmas III.1, III.2, III.3, III.4 and III.5.
For proving Lemma III.1, we need to derive a bound on the moments of -NN distances. We define the -NN ball centered at as
[TABLE]
Let denote the volume of the -NN ball with samples. Set
[TABLE]
Let and respectively denote the interior support and boundary of the support. For a point we have , and for we have . Note that the definition of interior and boundary points depends on and .
Lemma V.1
We have the following relation for any and for each point with density :
[TABLE]
where ,and is some bounded function of the density which is defined in [22].
Proof:
We start with a result from [22], A.25. Let be some arbitrary function, then we have the following relation
[TABLE]
where and are bias correction functions which depend on . We also have for a fixed . For example, if we set , then . Note that this term is negligible compared to other bias terms in our work.
Now according to [22], if we set , then we have and , which yields
[TABLE]
Finally, using the approximation results in (26). ∎
Now for the case of a bounded support, we derive an upper bound on -NN distances for the points at the boundary:
Lemma V.2
For every point and any we have
[TABLE]
Proof:
Define . Let denote any positive function satisfying + for some . Further consider the event as
[TABLE]
and as its complementary event. By using (B.2) in [22] (Appendix B), we have
[TABLE]
Moreover, we can simplify (30) as:
[TABLE]
Further we write as the sum of conditional expectations:
[TABLE]
where in the second line we have used (31) and also the fact that is bounded from above because of the bounded support.
∎
Proof:
From definition of Holder smoothness, for every we have
[TABLE]
Using Lemmas V.1 and V.2 results in
[TABLE]
where . Note that all other terms in (26) are of higher order and can be ignored. ∎
Proof:
[TABLE]
In the second line we have used triangle inequality for the first term, and Lipschitz condition for the second term. Again in the third line, we have applied Lipschitz condition for the first term, and finally in the forth line we have used Cauchy-Schwarz inequality.
∎
Proof:
Consider the following lemma which is proved immediately after the proof of Lemma III.3 :
Lemma V.3
Let for any point define and . Then can be derived as
[TABLE]
where and are defined as
[TABLE]
and .
Now from Lemma III.1 we can simply write and which results in:
[TABLE]
Remark 4
It can similarly be proven that
[TABLE]
∎
Proof:
Let be the sphere with the center (the -NN point of ) and some small radius . Also let and denote the following events:
[TABLE]
Let use the notation to denote .
Suppose be the density function of the RV . Then can be written as:
[TABLE]
where can be formulated using and as
[TABLE]
Let denote the probability of the sphere with density . Then there exist a function real function such that for any we have
[TABLE]
where is volume of the unit ball in dimension . From definition of the density function we have
[TABLE]
So, from (44) and (45) we get .
Now we compute as
[TABLE]
where . Note that .
Similarly, for we can prove that
[TABLE]
where is a function satisfying .
From (43), and considering the fact that (Proof:) and (47) hold true for any , we get
[TABLE]
where . Considering the Taylor expansion of for any real number such that and , we have
[TABLE]
where . Consequently, by using this fact and relation (48) we have
[TABLE]
and and are given by
[TABLE]
∎
Proof:
From definition of Poisson RV, we can write
[TABLE]
∎
Proof:
We use the following theorem from [21] to de-possonize the estimator.
Theorem V.4
Assume a sequence is given, and its poisson transform is :
[TABLE]
Consider a linear cone . Let the following conditions hold for some constants , and :
- •
For ,
[TABLE]
- •
For ,
[TABLE]
Then we have the following expansion that holds for every fixed :
[TABLE]
where .
Let and respectively represent the RVs and with the parameter .
Using the dePoissonization theorem, we take and . Since we are only interested in the values of , for which , we can assume . So, both the first and second conditions of the Theorem V.4 are satisfied. Then from (56), for :
[TABLE]
where .
∎
Finally at the end of this section, we mention that the bias proof for is pretty similar to the bias proof of and simply follows by the same steps.
B. Ensemble Estimator
In this section we state the MSE proof of the ensemble estimator. Assume that the density functions are from the Hölder space , which consists of those functions on having continuous derivatives up to order and the th partial derivatives are Hölder continuous with exponent , where and . We first compute the bias of interior points, by providing the following lemma.
Lemma V.5
For a constant parameter , let define and . Then for any point and any we have
[TABLE]
where is a constant defined in Lemma III.4 and is given by
[TABLE] where
a_i(Y_1)Y_1.
Proof:
Suppose that the density is times differentiable, and all of the derivatives are bounded. Let . Also let , where is defined as the -NN distance on the point . We can write , where is unit vector. Then the Taylor expansion of around is as follows
[TABLE]
So we apply Lemma V.3 with the following choices for ,
[TABLE]
which results in
[TABLE]
For the interior points, after simplifying given in equation (V.3), and using (26) we get
[TABLE]
where is a constant depending only on .
For boundary points, by using a result in **[16]**(Bias Proof), we can bound the densities and get the desired upper bound. According to this result, for any and any , we have
[TABLE]
where is the distance from to the boundary, and is a constant. Now note that since and the -NN ball meets the boundary, we have . Therefore, using the triangle inequality for (59) and setting , for every point we have
[TABLE]
where in the third line we have used (63) and the fact that . Using the bound on -NN distances for the boundary points derived in Lemma V.2, we have . After simplifying given in equation (V.3), we get
[TABLE]
The rest of the proof for both interior and boundary points follows similarly by replacing by in (III.3), and finally we get a result similar to (III).
∎
Lemma V.6
The bias of the estimator can be derived as follows
[TABLE]
Proof:
Let define the notations , and . Using Lemma V.5 we have
[TABLE]
Using equations (13) and (III) concludes the bias rate for . ∎
Proof:
The proof follows by using the ensemble theorem in (**[10]**, Theorem 4) with the parameters and .
∎
B. Variance Proof
Proof:
First note that the variance proof for and is contained in the the proof for , and also the proof for is similar to that. So, here we only focus on the variance proof of .
Assume that we have two set of nodes , and for . Without loss of generality, assume that . We consider the virtual random points with the same distribution as , and define . Now for using the Efron-Stein inequality on , we consider another independent copy of as and define . Let and . Then, according to Efron-Stein inequality we have
[TABLE]
Using the Mean Value Theorem, and going back to the definition , there exist some constant , such that
[TABLE]
Therefore, we only need to bound the RHS of (69), which is also an upper bound for .
[TABLE]
First, we give an upper bound on the first term in (Proof:), and the second term would be bounded similarly. Define
[TABLE]
Then we have
[TABLE]
where in the last line we used for . Next, we only need to find bounds on and . In the following lemma we derive the essential bounds.
Lemma V.7
* and satisfy the following relations:*
[TABLE]
∎
Proof:
The proof is similar for and . So here we only focus on . We can assume that we re-sample and separately, and both of the events are similar. Let and denote the re-sampling difference in (71) when we only re-sample either or points, respectively. Then it is easy to show that .
Considering the re-sampling of , we can write
[TABLE]
where is the event that none of and fall within nearest neighbor points of , is the event that and fall within and not among the nearest neighbor points of , respectively, is the event that and fall within and not among the nearest neighbor points of , respectively, and finally is the event that both of and fall within nearest neighbor points of . Now Note that in both of the events of and we have . Also since the events and are symmetric, we only consider the event :
[TABLE]
Going back to (74), we have
[TABLE]
By using Taylor expansion, there exist a constant such that
[TABLE]
Note that is bounded from above by . Also from (III) we get . Thus,
[TABLE]
We can similarly show that
[TABLE]
So, as result (Proof:) becomes
[TABLE]
Using a similar approach one can simply show that . So, finally we have . ∎
From (Proof:) and Lemma V.7 we get
[TABLE]
Using a similar approach, we can also simply show that
[TABLE]
Finally using (68), (69) and (Proof:) we get
[TABLE]
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] T. M. Cover and J. A. Thomas, Elements of information theory . John Wiley & Sons, 2012.
- 2[2] A. Rényi, “On measures of entropy and information,” in Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1 , pp. 547–561, University of California Press, 1961.
- 3[3] S. M. Ali and S. D. Silvey, “A general class of coefficients of divergence of one distribution from another,” Journal of the Royal Statistical Society. Series B (Methodological) , pp. 131–142, 1966.
- 4[4] K. R. Moon and A. O. Hero, “Ensemble estimation of multivariate f-divergence,” in Information Theory (ISIT), 2014 IEEE International Symposium on , pp. 356–360, IEEE, 2014.
- 5[5] K. R. Moon, M. Noshad, S. Y. Sekeh, and A. O. Hero III, “Information theoretic structure learning with confidence,” in Proc IEEE Int Conf Acoust Speech Signal Process , 2017.
- 6[6] J. Beardwood, J. H. Halton, and J. M. Hammersley, “The shortest path through many points,” in Math Proc Cambridge , vol. 55, pp. 299–327, Cambridge Univ Press, 1959.
- 7[7] A. O. Hero, J. Costa, and B. Ma, “Asymptotic relations between minimal graphs and alpha-entropy,” Comm. and Sig. Proc. Lab.(CSPL), Dept. EECS, University of Michigan, Ann Arbor, Tech. Rep , vol. 334, 2003.
- 8[8] J. H. Friedman and L. C. Rafsky, “Multivariate generalizations of the wald-wolfowitz and smirnov two-sample tests,” The Annals of Statistics , pp. 697–717, 1979.
