Optimal Coreset for Gaussian Kernel Density Estimation
Wai Ming Tai

TL;DR
This paper presents a new method for constructing small coresets for Gaussian kernel density estimation in fixed dimensions, achieving size bounds that surpass previous results and break the logarithmic barrier.
Contribution
The authors introduce a discrepancy-based coloring technique to construct coresets of size $O(1/\varepsilon)$ for Gaussian KDE, improving upon prior bounds and breaking the $\sqrt{\log}$ barrier.
Findings
Coreset size improved to $O(1/\varepsilon)$ for fixed dimensions.
Breakthrough in reducing the dependence on $\log(1/\varepsilon)$.
First to surpass the $\sqrt{\log}$ barrier for $d=2$.
Abstract
Given a point set , the kernel density estimate of is defined as \[ \overline{\mathcal{G}}_P(x) = \frac{1}{\left|P\right|}\sum_{p\in P}e^{-\left\lVert x-p \right\rVert^2} \] for any . We study how to construct a small subset of such that the kernel density estimate of is approximated by the kernel density estimate of . This subset is called a coreset. The main technique in this work is constructing a coloring on the point set by discrepancy theory and we leverage Banaszczyk's Theorem. When is a constant, our construction gives a coreset of size as opposed to the best-known result of . It is the first result to give a breakthrough on the barrier of factor even when .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
