Optimal Coreset for Gaussian Kernel Density Estimation

Wai Ming Tai

arXiv:2007.08031·cs.DS·February 22, 2022

Optimal Coreset for Gaussian Kernel Density Estimation

Wai Ming Tai

PDF

TL;DR

This paper presents a new method for constructing small coresets for Gaussian kernel density estimation in fixed dimensions, achieving size bounds that surpass previous results and break the logarithmic barrier.

Contribution

The authors introduce a discrepancy-based coloring technique to construct coresets of size $O(1/\varepsilon)$ for Gaussian KDE, improving upon prior bounds and breaking the $\sqrt{\log}$ barrier.

Findings

01

Coreset size improved to $O(1/\varepsilon)$ for fixed dimensions.

02

Breakthrough in reducing the dependence on $\log(1/\varepsilon)$.

03

First to surpass the $\sqrt{\log}$ barrier for $d=2$.

Abstract

Given a point set $P \subset R^{d}$ , the kernel density estimate of $P$ is defined as \[ \overline{\mathcal{G}}_P(x) = \frac{1}{\left|P\right|}\sum_{p\in P}e^{-\left\lVert x-p \right\rVert^2} \] for any $x \in R^{d}$ . We study how to construct a small subset $Q$ of $P$ such that the kernel density estimate of $P$ is approximated by the kernel density estimate of $Q$ . This subset $Q$ is called a coreset. The main technique in this work is constructing a $\pm 1$ coloring on the point set $P$ by discrepancy theory and we leverage Banaszczyk's Theorem. When $d > 1$ is a constant, our construction gives a coreset of size $O (\frac{1}{ε})$ as opposed to the best-known result of $O (\frac{1}{ε} lo g \frac{1}{ε})$ . It is the first result to give a breakthrough on the barrier of $lo g$ factor even when $d = 2$ .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.