An axiomatic approach to intrinsic dimension of a dataset
Vladimir Pestov

TL;DR
This paper refines an axiomatic framework for understanding dataset intrinsic dimension, linking it to the curse of dimensionality, manifold structure, and noise sensitivity, while discussing computational challenges and potential solutions.
Contribution
It advances the axiomatic approach to intrinsic dimension, clarifies its relation to data properties, and addresses computational and noise-related issues.
Findings
High intrinsic dimension indicates curse of dimensionality.
Sample dimension approximates manifold dimension with high probability.
Computing intrinsic dimension is NP-complete and sensitive to noise.
Abstract
We perform a deeper analysis of an axiomatic approach to the concept of intrinsic dimension of a dataset proposed by us in the IJCNN'07 paper (arXiv:cs/0703125). The main features of our approach are that a high intrinsic dimension of a dataset reflects the presence of the curse of dimensionality (in a certain mathematically precise sense), and that dimension of a discrete i.i.d. sample of a low-dimensional manifold is, with high probability, close to that of the manifold. At the same time, the intrinsic dimension of a sample is easily corrupted by moderate high-dimensional noise (of the same amplitude as the size of the manifold) and suffers from prohibitevely high computational complexity (computing it is an -complete problem). We outline a possible way to overcome these difficulties.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Data Visualization and Analytics · Time Series Analysis and Forecasting
