Analysis of k-Nearest Neighbor Distances with Application to Entropy   Estimation

Shashank Singh; Barnab\'as P\'oczos

arXiv:1603.08578·math.ST·July 22, 2016·27 cites

Analysis of k-Nearest Neighbor Distances with Application to Entropy Estimation

Shashank Singh, Barnab\'as P\'oczos

PDF

Open Access

TL;DR

This paper provides finite-sample bounds on the bias and variance of the Kozachenko-Leonenko entropy estimator, analyzing k-NN distances to establish minimax convergence rates and concentration inequalities.

Contribution

It offers the first finite-sample theoretical analysis of the KL estimator, including bias, variance bounds, and concentration inequalities for k-NN distances.

Findings

01

Achieves minimax convergence rate for certain smooth functions

02

Derives concentration inequalities for k-NN distances

03

Provides general expectation bounds for k-NN statistics

Abstract

Estimating entropy and mutual information consistently is important for many machine learning applications. The Kozachenko-Leonenko (KL) estimator (Kozachenko & Leonenko, 1987) is a widely used nonparametric estimator for the entropy of multivariate continuous random variables, as well as the basis of the mutual information estimator of Kraskov et al. (2004), perhaps the most widely used estimator of mutual information in this setting. Despite the practical importance of these estimators, major theoretical questions regarding their finite-sample behavior remain open. This paper proves finite-sample bounds on the bias and variance of the KL estimator, showing that it achieves the minimax convergence rate for certain classes of smooth functions. In proving these bounds, we analyze finite-sample behavior of k-nearest neighbors (k-NN) distance statistics (on which the KL estimator is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Statistical Methods and Models · Face and Expression Recognition · Statistical Methods and Inference

Methodsk-Nearest Neighbors