Analysis of k-Nearest Neighbor Distances with Application to Entropy Estimation
Shashank Singh, Barnab\'as P\'oczos

TL;DR
This paper provides finite-sample bounds on the bias and variance of the Kozachenko-Leonenko entropy estimator, analyzing k-NN distances to establish minimax convergence rates and concentration inequalities.
Contribution
It offers the first finite-sample theoretical analysis of the KL estimator, including bias, variance bounds, and concentration inequalities for k-NN distances.
Findings
Achieves minimax convergence rate for certain smooth functions
Derives concentration inequalities for k-NN distances
Provides general expectation bounds for k-NN statistics
Abstract
Estimating entropy and mutual information consistently is important for many machine learning applications. The Kozachenko-Leonenko (KL) estimator (Kozachenko & Leonenko, 1987) is a widely used nonparametric estimator for the entropy of multivariate continuous random variables, as well as the basis of the mutual information estimator of Kraskov et al. (2004), perhaps the most widely used estimator of mutual information in this setting. Despite the practical importance of these estimators, major theoretical questions regarding their finite-sample behavior remain open. This paper proves finite-sample bounds on the bias and variance of the KL estimator, showing that it achieves the minimax convergence rate for certain classes of smooth functions. In proving these bounds, we analyze finite-sample behavior of k-nearest neighbors (k-NN) distance statistics (on which the KL estimator is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Face and Expression Recognition · Statistical Methods and Inference
Methodsk-Nearest Neighbors
