A Multilabel Classification Framework for Approximate Nearest Neighbor Search
Ville Hyv\"onen, Elias J\"a\"asaari, Teemu Roos

TL;DR
This paper introduces a multilabel classification approach for approximate nearest neighbor search that improves partition-based index structures by directly modeling candidate set selection as a classification problem.
Contribution
It formulates candidate set selection as a multilabel classification problem and proves a consistency condition for partitioning classifiers in ANN search.
Findings
Natural classifiers improve ANN search performance.
The approach enhances existing partitioning strategies.
Verified a consistency condition for chronological k-d trees.
Abstract
Both supervised and unsupervised machine learning algorithms have been used to learn partition-based index structures for approximate nearest neighbor (ANN) search. Existing supervised algorithms formulate the learning task as finding a partition in which the nearest neighbors of a training set point belong to the same partition element as the point itself, so that the nearest neighbor candidates can be retrieved by naive lookup or backtracking search. We formulate candidate set selection in ANN search directly as a multilabel classification problem where the labels correspond to the nearest neighbors of the query point, and interpret the partitions as partitioning classifiers for solving this task. Empirical results suggest that the natural classifier based on this interpretation leads to strictly improved performance when combined with any unsupervised or supervised partitioning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsData Management and Algorithms · Advanced Image and Video Retrieval Techniques · Machine Learning and Algorithms
MethodsPrincipal Components Analysis
