Toward Metric Indexes for Incremental Insertion and Querying
Edward Raff, Charles Nicholas

TL;DR
This paper investigates metric index structures optimized for scenarios requiring interleaved insertions and queries, crucial for applications like malware analysis, by modifying algorithms to support incremental updates with arbitrary metrics.
Contribution
It introduces modifications to three algorithms to enable incremental insertion and querying with arbitrary metrics, identifying the best performing structure for this scenario.
Findings
Improved Vantage-Point tree of Minimum-Variance performs best
Supports arbitrary distance metrics in incremental scenarios
Evaluated on multiple datasets and metrics
Abstract
In this work we explore the use of metric index structures, which accelerate nearest neighbor queries, in the scenario where we need to interleave insertions and queries during deployment. This use-case is inspired by a real-life need in malware analysis triage, and is surprisingly understudied. Existing literature tends to either focus on only final query efficiency, often does not support incremental insertion, or does not support arbitrary distance metrics. We modify and improve three algorithms to support our scenario of incremental insertion and querying with arbitrary metrics, and evaluate them on multiple datasets and distance metrics while varying the value of for the desired number of nearest neighbors. In doing so we determine that our improved Vantage-Point tree of Minimum-Variance performs best for this scenario.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Caching and Content Delivery · Algorithms and Data Compression
