Data-Based Optimal Bandwidth for Kernel Density Estimation of Statistical Samples
Zhen-Wei Li, Ping He

TL;DR
This paper introduces a data-driven method to accurately determine the optimal bandwidth for kernel density estimation, improving upon existing analytic formulas by iteratively correcting for unknown density features.
Contribution
The authors develop an iterative correction approach that enables direct computation of the optimal bandwidth from data samples, addressing limitations of traditional analytic formulas.
Findings
Relative difference from analytic formulas is only 2%-3% for large samples.
Method performs well with sample sizes larger than 10^4.
Approach can be generalized to variable kernel estimations.
Abstract
It is a common practice to evaluate probability density function or matter spatial density function from statistical samples. Kernel density estimation is a frequently used method, but to select an optimal bandwidth of kernel estimation, which is completely based on data samples, is a long-term issue that has not been well settled so far. There exist analytic formulae of optimal kernel bandwidth, but they cannot be applied directly to data samples, since they depend on the unknown underlying density functions from which the samples are drawn. In this work, we devise an approach to pick out the totally data-based optimal bandwidth. First, we derive correction formulae for the analytic formulae of optimal bandwidth to compute the roughness of the sample's density function. Then substitute the correction formulae into the analytic formulae for optimal bandwidth, and through iteration, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
