On the Use of Bagging for Local Intrinsic Dimensionality Estimation
Krist\'of P\'eter, Ricardo J. G. B. Campello, James Bailey, Michael E. Houle

TL;DR
This paper introduces a bagging ensemble method for Local Intrinsic Dimensionality estimation that reduces variance and improves accuracy by carefully balancing sampling rate and neighborhood size, with theoretical and experimental validation.
Contribution
It proposes a novel ensemble approach using subbagging for LID estimation, analyzing its effects on bias and variance, and offers practical guidance for hyper-parameter selection.
Findings
Bagging significantly reduces variance and mean squared error in LID estimation.
The interplay between sampling rate and neighborhood size critically affects estimation performance.
Combining bagging with neighborhood smoothing yields further improvements.
Abstract
The theory of Local Intrinsic Dimensionality (LID) has become a valuable tool for characterizing local complexity within and across data manifolds, supporting a range of data mining and machine learning tasks. Accurate LID estimation requires samples drawn from small neighborhoods around each query to avoid biases from nonlocal effects and potential manifold mixing, yet limited data within such neighborhoods tends to cause high estimation variance. As a variance reduction strategy, we propose an ensemble approach that uses subbagging to preserve the local distribution of nearest neighbor (NN) distances. The main challenge is that the uniform reduction in total sample size within each subsample increases the proximity threshold for finding a fixed number k of NNs around the query. As a result, in the specific context of LID estimation, the sampling rate has an additional, complex…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Advanced Statistical Methods and Models · Advanced Clustering Algorithms Research
