Approximate Nearest Neighbour Search on Dynamic Datasets: An Investigation
Ben Harwood, Amir Dezfouli, Iadine Chades, Conrad Sanderson

TL;DR
This paper evaluates five approximate nearest neighbor methods on dynamic datasets, considering update costs, and finds that certain methods outperform traditional ones in online data collection and feature learning scenarios.
Contribution
It provides an empirical comparison of ANN methods on dynamic datasets, highlighting their suitability based on update costs and application context.
Findings
Hierarchical Navigable Small World Graphs outperform baseline in online data collection.
Scalable Nearest Neighbours is faster than baseline below 75% recall in online feature learning.
k-d trees are unsuitable for dynamic datasets due to slow update times.
Abstract
Approximate k-Nearest Neighbour (ANN) methods are often used for mining information and aiding machine learning on large scale high-dimensional datasets. ANN methods typically differ in the index structure used for accelerating searches, resulting in various recall/runtime trade-off points. For applications with static datasets, runtime constraints and dataset properties can be used to empirically select an ANN method with suitable operating characteristics. However, for applications with dynamic datasets, which are subject to frequent online changes (like addition of new samples), there is currently no consensus as to which ANN methods are most suitable. Traditional evaluation approaches do not consider the computational costs of updating the index structure, as well as the rate and size of index updates. To address this, we empirically evaluate 5 popular ANN methods on two main…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Image and Video Retrieval Techniques · Geographic Information Systems Studies
