Statistical Estimation of Conditional Shannon Entropy
Alexander Bulinski, Alexey Kozhevin

TL;DR
This paper introduces new estimators for conditional Shannon entropy using spatial order statistics and k-nearest neighbors, with proven asymptotic unbiasedness and consistency, applicable to feature selection in medical and biological research.
Contribution
The paper proposes novel entropy estimators based on spatial order statistics, extending previous methods and providing theoretical guarantees of asymptotic unbiasedness and consistency.
Findings
Estimates are asymptotically unbiased.
Estimates are L^2-consistent.
Applicable to feature selection in biological data.
Abstract
The new estimates of the conditional Shannon entropy are introduced in the framework of the model describing a discrete response variable depending on a vector of d factors having a density w.r.t. the Lebesgue measure in R^d. Namely, the mixed-pair model (X,Y) is considered where X and Y take values in R^d and an arbitrary finite set, respectively. Such models include, for instance, the famous logistic regression. In contrast to the well-known Kozachenko -- Leonenko estimates of unconditional entropy the proposed estimates are constructed by means of the certain spacial order statistics (or k-nearest neighbor statistics where k=k_n depends on amount of observations n) and a random number of i.i.d. observations contained in the balls of specified random radii. The asymptotic unbiasedness and L^2-consistency of the new estimates are established under simple conditions. The obtained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
