A Survey of Dimension Estimation Methods
James A. D. Binnie, Pawe{\l} D{\l}otko, John Harvey, Jakub Malinowski, Ka Man Yim

TL;DR
This survey reviews various methods for estimating the intrinsic dimension of high-dimensional datasets, evaluating their performance, robustness, and limitations across different geometric and noise conditions.
Contribution
It categorizes dimension estimation techniques based on geometric information and provides a comprehensive performance evaluation and guidance for their reliable application.
Findings
Tangential estimators effectively detect local affine structures.
Parametric estimators depend heavily on distribution assumptions.
Many estimators overfit hyperparameters on benchmark datasets.
Abstract
It is a standard assumption that datasets in high dimension have an internal structure which means that they in fact lie on, or near, subsets of a lower dimension. In many instances it is important to understand the real dimension of the data, hence the complexity of the dataset at hand. A great variety of dimension estimators have been developed to find the intrinsic dimension of the data but there is little guidance on how to reliably use these estimators. This survey reviews a wide range of dimension estimation methods, categorising them by the geometric information they exploit: tangential estimators which detect a local affine structure; parametric estimators which rely on dimension-dependent probability distributions; and estimators which use topological or metric invariants. The paper evaluates the performance of these methods, as well as investigating varying responses to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Topological and Geometric Data Analysis · Advanced Statistical Methods and Models
