Natural data structure extracted from neighborhood-similarity graphs

Tom Lorimer; Karlis Kanders; Ruedi Stoop

arXiv:1803.00500·stat.ML·February 20, 2019

Natural data structure extracted from neighborhood-similarity graphs

Tom Lorimer, Karlis Kanders, Ruedi Stoop

PDF

TL;DR

This paper introduces a new non-iterative method that encodes neighborhood similarities as a sparse graph to analyze high-dimensional data without distortion or bias, enabling transparent interpretation.

Contribution

It presents a novel approach that directly encodes neighborhood similarities into a sparse graph, avoiding dimensionality reduction or clustering biases.

Findings

01

Effective on natural and synthetic datasets

02

Preserves original data structure and metric

03

Provides transparent data interpretation

Abstract

'Big' high-dimensional data are commonly analyzed in low-dimensions, after performing a dimensionality-reduction step that inherently distorts the data structure. For the same purpose, clustering methods are also often used. These methods also introduce a bias, either by starting from the assumption of a particular geometric form of the clusters, or by using iterative schemes to enhance cluster contours, with uncontrollable consequences. The goal of data analysis should, however, be to encode and detect structural data features at all scales and densities simultaneously, without assuming a parametric form of data point distances, or modifying them. We propose a novel approach that directly encodes data point neighborhood similarities as a sparse graph. Our non-iterative framework permits a transparent interpretation of data, without altering the original data dimension and metric.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.