TL;DR
This paper introduces a minimal neighborhood-based intrinsic dimension estimator that effectively measures the true dimensionality of complex datasets, even with noise and non-uniform distributions, reducing computational costs.
Contribution
A novel intrinsic dimension estimator using only first and second nearest neighbor distances, capable of handling curved manifolds and density variations efficiently.
Findings
The estimator is theoretically exact for uniformly distributed data.
It provides consistent measures for general datasets.
Effective in distinguishing relevant dimensions in noisy, high-dimensional data.
Abstract
Analyzing large volumes of high-dimensional data is an issue of fundamental importance in data science, molecular simulations and beyond. Several approaches work on the assumption that the important content of a dataset belongs to a manifold whose Intrinsic Dimension (ID) is much lower than the crude large number of coordinates. Such manifold is generally twisted and curved, in addition points on it will be non-uniformly distributed: two factors that make the identification of the ID and its exploitation really hard. Here we propose a new ID estimator using only the distance of the first and the second nearest neighbor of each point in the sample. This extreme minimality enables us to reduce the effects of curvature, of density variation, and the resulting computational cost. The ID estimator is theoretically exact in uniformly distributed datasets, and provides consistent measures in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
