Learning Cluster Representatives for Approximate Nearest Neighbor Search

Thomas Vecchiato

arXiv:2412.05921·cs.IR·December 10, 2024

Learning Cluster Representatives for Approximate Nearest Neighbor Search

Thomas Vecchiato

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel learning-to-rank approach for clustering-based approximate nearest neighbor search, significantly improving accuracy by learning cluster representatives through a simple linear function.

Contribution

It presents a new method that leverages learning-to-rank for optimizing cluster representatives, enhancing the efficiency and accuracy of approximate nearest neighbor search.

Findings

01

Learning cluster representatives with a linear function improves search accuracy.

02

The method effectively reduces search space in high-dimensional data.

03

Demonstrates state-of-the-art performance in maximum inner product search.

Abstract

Developing increasingly efficient and accurate algorithms for approximate nearest neighbor search is a paramount goal in modern information retrieval. A primary approach to addressing this question is clustering, which involves partitioning the dataset into distinct groups, with each group characterized by a representative data point. By this method, retrieving the top-k data points for a query requires identifying the most relevant clusters based on their representatives -- a routing step -- and then conducting a nearest neighbor search within these clusters only, drastically reducing the search space. The objective of this thesis is not only to provide a comprehensive explanation of clustering-based approximate nearest neighbor search but also to introduce and delve into every aspect of our novel state-of-the-art method, which originated from a natural observation: The routing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tomvek/mips-learnt-ivf
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms · Face and Expression Recognition