Statistical computation of Boltzmann entropy and estimation of the optimal probability density function from statistical sample
Ning Sui, Min Li, Ping He

TL;DR
This paper proposes a data-driven method to determine the optimal bin width or bandwidth for entropy estimation using histogram and kernel methods, based on the minimum of the first derivative of entropy, validated through extensive numerical experiments.
Contribution
It introduces a novel, purely data-based approach to select optimal density estimation parameters by analyzing the entropy derivative, applicable to both univariate and multivariate data.
Findings
The minimum of the first derivative of entropy indicates the optimal bin width or bandwidth.
This method is independent of the unknown underlying distribution.
Validated through extensive numerical experiments.
Abstract
In this work, we investigate the statistical computation of the Boltzmann entropy of statistical samples. For this purpose, we use both histogram and kernel function to estimate the probability density function of statistical samples. We find that, due to coarse-graining, the entropy is a monotonic increasing function of the bin width for histogram or bandwidth for kernel estimation, which seems to be difficult to select an optimal bin width/bandwidth for computing the entropy. Fortunately, we notice that there exists a minimum of the first derivative of entropy for both histogram and kernel estimation, and this minimum point of the first derivative asymptotically points to the optimal bin width or bandwidth. We have verified these findings by large amounts of numerical experiments. Hence, we suggest that the minimum of the first derivative of entropy be used as a selector for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
