# Pyramid: A General Framework for Distributed Similarity Search

**Authors:** Shiyuan Deng, Xiao Yan, Kelvin K.W. Ng, Chenyu Jiang, James Cheng

arXiv: 1906.10602 · 2019-06-26

## TL;DR

Pyramid is a scalable distributed framework for similarity search that leverages HNSW graphs, supporting various similarity functions, with high throughput, robustness, and ease of use for large datasets.

## Contribution

It introduces Pyramid, a novel distributed similarity search framework based on HNSW, enabling efficient, robust, and user-friendly search across large-scale datasets.

## Key findings

- Achieves high query throughput on large datasets.
- Provides robust performance under node failures and stragglers.
- Supports multiple similarity functions like Euclidean, angular, and inner product.

## Abstract

Similarity search is a core component in various applications such as image matching, product recommendation and low-shot classification. However, single machine solutions are usually insufficient due to the large cardinality of modern datasets and stringent latency requirement of on-line query processing. We present Pyramid, a general and efficient framework for distributed similarity search. Pyramid supports search with popular similarity functions including Euclidean distance, angular distance and inner product. Different from existing distributed solutions that are based on KD-tree or locality sensitive hashing (LSH), Pyramid is based on Hierarchical Navigable Small World graph (HNSW), which is the state of the art similarity search algorithm on a single machine. To achieve high query processing throughput, Pyramid partitions a dataset into sub-datasets containing similar items for index building and assigns a query to only some of the sub-datasets for query processing. To provide the robustness required by production deployment, Pyramid also supports failure recovery and straggler mitigation. Pyramid offers a set of concise API such that users can easily use Pyramid without knowing the details of distributed execution. Experiments on large-scale datasets show that Pyramid produces quality results for similarity search, achieves high query processing throughput and is robust under node failure and straggler.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.10602/full.md

## Figures

25 figures with captions in the complete paper: https://tomesphere.com/paper/1906.10602/full.md

## References

43 references — full list in the complete paper: https://tomesphere.com/paper/1906.10602/full.md

---
Source: https://tomesphere.com/paper/1906.10602