Scale-adaptive and robust intrinsic dimension estimation via optimal neighbourhood identification
Antonio Di Noia, Iuri Macocco, Aldo Glielmo, Alessandro Laio, Antonietta Mira

TL;DR
This paper introduces an automatic method to identify the optimal scale for intrinsic dimension estimation, ensuring meaningful and robust results across different datasets and noise conditions.
Contribution
It presents a self-consistent protocol that selects the appropriate scale for ID estimation by enforcing constant data density at small scales.
Findings
The method accurately identifies the correct scale in artificial datasets.
It demonstrates robustness to noise in real-world datasets.
The protocol improves the reliability of intrinsic dimension estimates.
Abstract
The Intrinsic Dimension (ID) is a key concept in unsupervised learning and feature selection, as it is a lower bound to the number of variables which are necessary to describe a system. However, in almost any real-world dataset the ID depends on the scale at which the data are analysed. Quite typically at a small scale, the ID is very large, as the data are affected by measurement errors. At large scale, the ID can also appear erroneously large, due to the curvature and the topology of the manifold containing the data. In this work, we introduce an automatic protocol to select the sweet spot, namely the correct range of scales in which the ID is meaningful and useful. This protocol is based on imposing that for distances smaller than the correct scale the density of the data is constant. In the presented framework, to estimate the density it is necessary to know the ID, therefore, this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
