Intrinsic dimension of a dataset: what properties does one expect?
Vladimir Pestov

TL;DR
This paper introduces an axiomatic framework for defining the intrinsic dimension of datasets, linking high dimension to the curse of dimensionality and ensuring smooth dependence on dataset similarity.
Contribution
It formalizes the concept of intrinsic dimension with axioms and proposes a new dimension function satisfying these axioms, advancing the theoretical understanding of dataset geometry.
Findings
Proposed an axiomatic approach to dataset intrinsic dimension.
Provided an example of a dimension function satisfying the axioms.
Discussed computationally feasible approximations of the dimension.
Abstract
We propose an axiomatic approach to the concept of an intrinsic dimension of a dataset, based on a viewpoint of geometry of high-dimensional structures. Our first axiom postulates that high values of dimension be indicative of the presence of the curse of dimensionality (in a certain precise mathematical sense). The second axiom requires the dimension to depend smoothly on a distance between datasets (so that the dimension of a dataset and that of an approximating principal manifold would be close to each other). The third axiom is a normalization condition: the dimension of the Euclidean -sphere is . We give an example of a dimension function satisfying our axioms, even though it is in general computationally unfeasible, and discuss a computationally cheap function satisfying most but not all of our axioms (the ``intrinsic dimensionality'' of Ch\'avez et al.)
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
