Intrinsic Dimensionality Estimation within Tight Localities: A Theoretical and Experimental Analysis
Laurent Amsaleg (CNRS-IRISA, France), Oussama Chelly (Amazon Web, Services, Munich, Germany), Michael E. Houle (The University of Melbourne,, Australia), Ken-ichi Kawarabayashi (National Institute of Informatics,, Japan), Milo\v{s} Radovanovi\'c (University of Novi Sad, Serbia)

TL;DR
This paper introduces a new local intrinsic dimensionality estimation method that remains stable with very small sample sizes, improving accuracy and variance over existing techniques.
Contribution
It proposes a novel ID estimator based on extreme-value theory that works effectively with as few as 20 points, unlike previous methods.
Findings
Achieves smaller variance than existing estimators.
Maintains comparable bias levels.
Effective with small local sample sizes.
Abstract
Accurate estimation of Intrinsic Dimensionality (ID) is of crucial importance in many data mining and machine learning tasks, including dimensionality reduction, outlier detection, similarity search and subspace clustering. However, since their convergence generally requires sample sizes (that is, neighborhood sizes) on the order of hundreds of points, existing ID estimation methods may have only limited usefulness for applications in which the data consists of many natural groups of small size. In this paper, we propose a local ID estimation strategy stable even for `tight' localities consisting of as few as 20 sample points. The estimator applies MLE techniques over all available pairwise distances among the members of the sample, based on a recent extreme-value-theoretic model of intrinsic dimensionality, the Local Intrinsic Dimension (LID). Our experimental results show that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Statistical Methods and Inference · Advanced Statistical Methods and Models
