Intrinsic Dimension for Large-Scale Geometric Learning
Maximilian Stubbemann, Tom Hanika, Friedrich Martin Schneider

TL;DR
This paper introduces a computationally feasible method to determine the intrinsic dimension of large-scale geometric data, incorporating neighborhood information, and demonstrates its application to graph learning datasets.
Contribution
It provides a new axiomatic approach to intrinsic dimension that is scalable and accounts for geometric and neighborhood structures in data.
Findings
The method is computationally feasible for large datasets.
It effectively incorporates neighborhood information in intrinsic dimension estimation.
Experiments on the Open Graph Benchmark validate the approach.
Abstract
The concept of dimension is essential to grasp the complexity of data. A naive approach to determine the dimension of a dataset is based on the number of attributes. More sophisticated methods derive a notion of intrinsic dimension (ID) that employs more complex feature functions, e.g., distances between data points. Yet, many of these approaches are based on empirical observations, cannot cope with the geometric character of contemporary datasets, and do lack an axiomatic foundation. A different approach was proposed by V. Pestov, who links the intrinsic dimension axiomatically to the mathematical concentration of measure phenomenon. First methods to compute this and related notions for ID were computationally intractable for large-scale real-world datasets. In the present work, we derive a computationally feasible method for determining said axiomatic ID functions. Moreover, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Bayesian Modeling and Causal Inference · Data Visualization and Analytics
