Learned k-NN Distance Estimation

Daichi Amagata; Yusuke Arai; Sumio Fujita; Takahiro Hara

arXiv:2208.14210·cs.DB·November 29, 2022

Learned k-NN Distance Estimation

Daichi Amagata, Yusuke Arai, Sumio Fujita, Takahiro Hara

PDF

Open Access 1 Repo

TL;DR

This paper introduces a neural network-based method to estimate k-NN distances rapidly and accurately, significantly reducing data access costs in large-scale proximity analysis tasks.

Contribution

It presents a novel machine learning approach that predicts k-NN distances with O(1) inference time, avoiding extensive data retrieval.

Findings

01

Achieves high accuracy in distance estimation

02

Inference time is constant, O(1)

03

Demonstrates efficiency on real datasets

Abstract

Big data mining is well known to be an important task for data science, because it can provide useful observations and new knowledge hidden in given large datasets. Proximity-based data analysis is particularly utilized in many real-life applications. In such analysis, the distances to k nearest neighbors are usually employed, thus its main bottleneck is derived from data retrieval. Much efforts have been made to improve the efficiency of these analyses. However, they still incur large costs, because they essentially need many data accesses. To avoid this issue, we propose a machine-learning technique that quickly and accurately estimates the k-NN distances (i.e., distances to the k nearest neighbors) of a given query. We train a fully connected neural network model and utilize pivots to achieve accurate estimation. Our model is designed to have useful advantages: it infers distances to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

arailly/pivnet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms · Data Stream Mining Techniques · Anomaly Detection Techniques and Applications

Methodsk-Nearest Neighbors