Convergence Rates for Empirical Estimation of Binary Classification Bounds
Salimeh Yasaei Sekeh, Morteza Noshad, Kevin R. Moon, Alfred O. Hero

TL;DR
This paper analyzes the convergence rates of the Friedman-Rafsky estimator for the Henze-Penrose divergence, providing theoretical bounds and experimental validation for its use in binary classification error estimation.
Contribution
It derives a convergence rate bound for the Friedman-Rafsky estimator of the Henze-Penrose divergence, enhancing understanding of its empirical performance.
Findings
Established a concentration inequality for the estimator.
Validated theoretical bounds through experiments.
Demonstrated application to real datasets.
Abstract
Bounding the best achievable error probability for binary classification problems is relevant to many applications including machine learning, signal processing, and information theory. Many bounds on the Bayes binary classification error rate depend on information divergences between the pair of class distributions. Recently, the Henze-Penrose (HP) divergence has been proposed for bounding classification error probability. We consider the problem of empirically estimating the HP-divergence from random samples. We derive a bound on the convergence rate for the Friedman-Rafsky (FR) estimator of the HP-divergence, which is related to a multivariate runs statistic for testing between two distributions. The FR estimator is derived from a multicolored Euclidean minimal spanning tree (MST) that spans the merged samples. We obtain a concentration inequality for the Friedman-Rafsky estimator of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
