Exact mean integrated squared error and bandwidth selection for kernel distribution function estimators
Vitaliy Oryshchenko

TL;DR
This paper derives an exact formula for the mean integrated squared error of kernel distribution function estimators, compares it with other methods, and proposes a practical bandwidth and kernel order selection technique.
Contribution
It introduces a closed-form MISE expression for Gaussian-based kernels and a plug-in method for optimal bandwidth and kernel order selection based on normal mixture approximation.
Findings
The proposed method performs well in finite samples.
Exact MISE formulas facilitate better kernel and bandwidth choices.
Guides when to use higher order kernels in distribution estimation.
Abstract
An exact, closed form, and easy to compute expression for the mean integrated squared error (MISE) of a kernel estimator of a normal mixture cumulative distribution function is derived for the class of arbitrary order Gaussian-based kernels. Comparisons are made with MISE of the empirical distribution function, the infeasible minimum MISE of kernel estimators, and the asymptotically optimal second order uniform kernel. The results afford straightforward extensions to other classes of kernel functions and distributions. The analysis also offers a guide on when to use higher order kernels in distribution function estimation. A simple plug-in method of simultaneously selecting the optimal bandwidth and kernel order is proposed based on a non-asymptotic approximation of the unknown distribution by a normal mixture. A simulation study shows that the method works well in finite samples,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
