Unconventional application of k-means for distributed approximate similarity search
Felipe Ortega, Maria Jesus Algar, Isaac Mart\'in de Diego and, Javier M. Moguerza

TL;DR
This paper introduces MASK, a novel multilevel indexing method for approximate similarity search in metric spaces, leveraging an unconventional application of k-means clustering to improve efficiency in high-dimensional, high-sparsity datasets.
Contribution
The paper presents a new indexing approach called MASK that uses k-means clustering properties for efficient approximate similarity search in metric spaces.
Findings
Effective in high-dimensional, high-sparsity datasets
Promising results on synthetic and real-world datasets
Leverages k-means properties for multilevel indexing
Abstract
Similarity search based on a distance function in metric spaces is a fundamental problem for many applications. Queries for similar objects lead to the well-known machine learning task of nearest-neighbours identification. Many data indexing strategies, collectively known as Metric Access Methods (MAM), have been proposed to speed up queries for similar elements in this context. Moreover, since exact approaches to solve similarity queries can be complex and time-consuming, alternative options have appeared to reduce query execution time, such as returning approximate results or resorting to distributed computing platforms. In this paper, we introduce MASK (Multilevel Approximate Similarity search with -means), an unconventional application of the -means algorithm as the foundation of a multilevel index structure for approximate similarity search, suitable for metric spaces. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Image and Video Retrieval Techniques · Geographic Information Systems Studies
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
