Let them have CAKES: A Cutting-Edge Algorithm for Scalable, Efficient, and Exact Search on Big Data
Morgan E. Prior, Thomas J. Howard III, Oliver McLaughlin and, Terrence Ferguson, Najib Ishaq, Noah M. Daniels

TL;DR
CAKES introduces three novel, exact $k$-NN search algorithms that are scalable, efficient, and applicable to any distance function, demonstrating near-constant scaling and high recall on large, complex datasets.
Contribution
The paper presents CAKES, a set of three exact $k$-NN algorithms that scale with dataset complexity rather than size or dimension, and are applicable to any distance function.
Findings
Near-constant scaling with dataset size on manifold data
Perfect recall on metric space data
Higher recall than state-of-the-art for non-metric distances
Abstract
The ongoing Big Data explosion has created a demand for efficient and scalable algorithms for similarity search. Most recent work has focused on \textit{approximate} -NN search, and while this may be sufficient for some applications, \textit{exact} -NN search would be ideal for many applications. We present CAKES, a set of three novel, exact algorithms for -NN search. CAKES's algorithms are generic over \textit{any} distance function, and they \textit{do not} scale with the cardinality or embedding dimension of the dataset, but rather with its metric entropy and fractal dimension. We test these claims on datasets from the ANN-Benchmarks suite under commonly-used distance functions, as well as on a genomic dataset with Levenshtein distance and a radio-frequency dataset with Dynamic Time Warping distance. We demonstrate that CAKES exhibits near-constant scaling with cardinality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Algorithms and Data Compression · Advanced Image and Video Retrieval Techniques
