# Billion-scale similarity search with GPUs

**Authors:** Jeff Johnson, Matthijs Douze, Herv\'e J\'egou

arXiv: 1702.08734 · 2018-06-07

## TL;DR

This paper presents a GPU-optimized similarity search method that significantly accelerates high-dimensional nearest neighbor searches, enabling large-scale graph construction on billions of vectors with high accuracy.

## Contribution

It introduces a novel GPU-based k-selection algorithm operating at up to 55% of peak performance, greatly improving similarity search speed and scalability.

## Key findings

- Nearest neighbor search is 8.5x faster than previous GPU methods.
- Constructed a 95-million-image k-NN graph in 35 minutes.
- Built a 1-billion-vector graph in under 12 hours on 4 GPUs.

## Abstract

Similarity search finds application in specialized database systems handling complex data such as images or videos, which are typically represented by high-dimensional features and require specific indexing structures. This paper tackles the problem of better utilizing GPUs for this task. While GPUs excel at data-parallel tasks, prior approaches are bottlenecked by algorithms that expose less parallelism, such as k-min selection, or make poor use of the memory hierarchy.   We propose a design for k-selection that operates at up to 55% of theoretical peak performance, enabling a nearest neighbor implementation that is 8.5x faster than prior GPU state of the art. We apply it in different similarity search scenarios, by proposing optimized design for brute-force, approximate and compressed-domain search based on product quantization. In all these setups, we outperform the state of the art by large margins. Our implementation enables the construction of a high accuracy k-NN graph on 95 million images from the Yfcc100M dataset in 35 minutes, and of a graph connecting 1 billion vectors in less than 12 hours on 4 Maxwell Titan X GPUs. We have open-sourced our approach for the sake of comparison and reproducibility.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1702.08734/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/1702.08734/full.md

## References

48 references — full list in the complete paper: https://tomesphere.com/paper/1702.08734/full.md

---
Source: https://tomesphere.com/paper/1702.08734