Sub-linear RACE Sketches for Approximate Kernel Density Estimation on Streaming Data
Benjamin Coleman, Anshumali Shrivastava

TL;DR
The paper introduces RACE, a novel sketching algorithm that efficiently estimates kernel density on high-dimensional streaming data, offering significant compression and theoretical guarantees.
Contribution
RACE is a new sketching method that compresses high-dimensional streaming data for kernel density estimation with strong theoretical guarantees and practical efficiency.
Findings
Achieves 10x better compression than competing methods
Effective for high-dimensional streaming data
Provides strong theoretical guarantees
Abstract
Kernel density estimation is a simple and effective method that lies at the heart of many important machine learning applications. Unfortunately, kernel methods scale poorly for large, high dimensional datasets. Approximate kernel density estimation has a prohibitively high memory and computation cost, especially in the streaming setting. Recent sampling algorithms for high dimensional densities can reduce the computation cost but cannot operate online, while streaming algorithms cannot handle high dimensional datasets due to the curse of dimensionality. We propose RACE, an efficient sketching algorithm for kernel density estimation on high-dimensional streaming data. RACE compresses a set of N high dimensional vectors into a small array of integer counters. This array is sufficient to estimate the kernel density for a large class of kernels. Our sketch is practical to implement and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
