DEANN: Speeding up Kernel-Density Estimation using Approximate Nearest Neighbor Search
Matti Karppa, Martin Aum\"uller, Rasmus Pagh

TL;DR
DEANN introduces a novel approach that combines Approximate Nearest Neighbor algorithms with Random Sampling to significantly accelerate Kernel Density Estimation, especially in high-dimensional data, by efficiently identifying influential points.
Contribution
The paper presents a new algorithm, DEANN, that leverages ANN algorithms as a black box to speed up KDE computations, with a theoretical foundation and practical implementation.
Findings
Outperforms existing KDE methods on high-dimensional datasets.
Matches Random Sampling performance when ANN does not provide speedup.
Provides a flexible C++/Python implementation for broad applicability.
Abstract
Kernel Density Estimation (KDE) is a nonparametric method for estimating the shape of a density function, given a set of samples from the distribution. Recently, locality-sensitive hashing, originally proposed as a tool for nearest neighbor search, has been shown to enable fast KDE data structures. However, these approaches do not take advantage of the many other advances that have been made in algorithms for nearest neighbor algorithms. We present an algorithm called Density Estimation from Approximate Nearest Neighbors (DEANN) where we apply Approximate Nearest Neighbor (ANN) algorithms as a black box subroutine to compute an unbiased KDE. The idea is to find points that have a large contribution to the KDE using ANN, compute their contribution exactly, and approximate the remainder with Random Sampling (RS). We present a theoretical argument that supports the idea that an ANN…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning · Image Retrieval and Classification Techniques
