A Memory-Efficient Distributed Algorithm for Approximate Nearest Neighbour Search with Arbitrary Distances
Elena Garcia-Morato, Maria Jesus Algar, Cesar Alfaro, Felipe Ortega, Javier Gomez, Javier M. Moguerza

TL;DR
This paper presents PDASC, a distributed approximate nearest neighbor search algorithm that supports arbitrary dissimilarity functions, is memory-efficient, and scalable across distributed systems, improving accuracy and efficiency in high-dimensional, heterogeneous datasets.
Contribution
PDASC introduces a novel, clustering-based distributed index that is agnostic to distance properties, enabling flexible, memory-efficient ANN search in distributed environments.
Findings
Achieves competitive accuracy with lower per-node memory.
Supports arbitrary dissimilarity functions, including non-metric.
Provides scalable, energy-efficient ANN search without hardware acceleration.
Abstract
Approximate nearest neighbour (ANN) search has become a central task in modern data-intensive applications, particularly when operating on large, heterogeneous, or high-dimensional datasets. However, many existing ANN methods struggle in such scenarios, either because they rely on metric assumptions or because their indexing strategies are not well suited to distributed environments or to settings with constrained memory resources. This work introduces PDASC (Parametrizable Distributed Approximate Similarity Search with Clustering), a distributed ANN search algorithm whose index design simultaneously supports arbitrary dissimilarity functions and efficient deployment in distributed, storage-aware environments. PDASC builds a distributed hierarchical index based on clustering mechanisms that are agnostic to distance properties, thereby accommodating non-metric and domain-specific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Time Series Analysis and Forecasting · Data Mining Algorithms and Applications
