Automatic topography of high-dimensional data sets by non-parametric   Density Peak clustering

Maria d'Errico; Elena Facco; Alessandro Laio; and Alex Rodriguez

arXiv:1802.10549·stat.ML·March 2, 2021

Automatic topography of high-dimensional data sets by non-parametric Density Peak clustering

Maria d'Errico, Elena Facco, Alessandro Laio, and Alex Rodriguez

PDF

2 Repos

TL;DR

This paper introduces a non-parametric, unsupervised method to automatically generate a topographical map of high-dimensional data, revealing its main structure and density features with statistical reliability.

Contribution

It extends Density Peak clustering with a non-parametric density estimator that measures density, peak height, and valley depth, providing robust, hierarchical, and visual data descriptions.

Findings

01

Automatically identifies the number and height of density peaks

02

Provides a measure of density estimation error for reliability

03

Enhances understanding of complex high-dimensional data structures

Abstract

Data analysis in high-dimensional spaces aims at obtaining a synthetic description of a data set, revealing its main structure and its salient features. We here introduce an approach providing this description in the form of a topography of the data, namely a human-readable chart of the probability density from which the data are harvested. The approach is based on an unsupervised extension of Density Peak clustering and a non-parametric density estimator that measures the probability density in the manifold containing the data. This allows finding automatically the number and the height of the peaks of the probability density, and the depth of the "valleys" separating them. Importantly, the density estimator provides a measure of the error, which allows distinguishing genuine density peaks from density fluctuations due to finite sampling. The approach thus provides robust and visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.