Interpreting the Curse of Dimensionality from Distance Concentration and Manifold Effect
Dehua Peng, Zhipeng Gui, Huayi Wu

TL;DR
This paper analyzes how high-dimensional data affects distance measures and data structure, revealing that traditional nearest neighbor searches become ineffective and PCA variance skews, impacting model performance.
Contribution
It provides a theoretical and empirical analysis of the causes of the curse of dimensionality, focusing on distance concentration and manifold effects.
Findings
Nearest neighbor search becomes meaningless in high dimensions.
PCA variance concentrates in fewer dimensions as dimensionality increases.
Distance measures like Minkowski, Chebyshev, and cosine lose effectiveness.
Abstract
The characteristics of data like distribution and heterogeneity, become more complex and counterintuitive as dimensionality increases. This phenomenon is known as curse of dimensionality, where common patterns and relationships (e.g., internal pattern and boundary pattern) that hold in low-dimensional space may be invalid in higher-dimensional space. It leads to a decreasing performance for the regression, classification, or clustering models or algorithms. Curse of dimensionality can be attributed to many causes. In this paper, we first summarize the potential challenges associated with manipulating high-dimensional data, and explains the possible causes for the failure of regression, classification, or clustering tasks. Subsequently, we delve into two major causes of the curse of dimensionality, distance concentration, and manifold effect, by performing theoretical and empirical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Advanced Clustering Algorithms Research · Neural Networks and Applications
