A geometric framework for modelling similarity search

Vladimir Pestov

arXiv:cs/9904002·cs.IR·November 17, 2016

A geometric framework for modelling similarity search

Vladimir Pestov

PDF

TL;DR

This paper introduces a geometric framework for modeling similarity search in high-dimensional data spaces, integrating metric geometry concepts to analyze complexity, indexability, and the curse of dimensionality.

Contribution

It proposes a novel geometric framework based on metric geometry concepts to better understand and analyze similarity search challenges in large, high-dimensional datasets.

Findings

01

Provides a geometric perspective on similarity workloads

02

Analyzes the curse of dimensionality using measure concentration

03

Bridges database research with metric geometry techniques

Abstract

The aim of this paper is to propose a geometric framework for modelling similarity search in large and multidimensional data spaces of general nature, which seems to be flexible enough to address such issues as analysis of complexity, indexability, and the `curse of dimensionality.' Such a framework is provided by the concept of the so-called similarity workload, which is a probability metric space $Ω$ (query domain) with a distinguished finite subspace $X$ (dataset), together with an assembly of concepts, techniques, and results from metric geometry. They include such notions as metric transform, $\e$ -entropy, and the phenomenon of concentration of measure on high-dimensional structures. In particular, we discuss the relevance of the latter to understanding the curse of dimensionality. As some of those concepts and techniques are being currently reinvented by the database…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.