Balancing Geometry and Density: Path Distances on High-Dimensional Data
Anna Little, Daniel McKenzie, James Murphy

TL;DR
This paper analyzes power-weighted shortest-path distances (PWSPDs) in high-dimensional data, revealing how they balance geometric structure and data density, with theoretical guarantees and practical insights for machine learning applications.
Contribution
It provides new geometric and computational insights into PWSPDs, including their relation to nearest neighbor graphs, percolation theory, and their dependence on intrinsic data dimension.
Findings
PWSPDs effectively balance density and geometry in high-dimensional data.
Theoretical guarantees relate PWSPDs on complete graphs to nearest neighbor graphs.
Experimental results demonstrate PWSPDs' versatility across data settings.
Abstract
New geometric and computational analyses of power-weighted shortest-path distances (PWSPDs) are presented. By illuminating the way these metrics balance density and geometry in the underlying data, we clarify their key parameters and discuss how they may be chosen in practice. Comparisons are made with related data-driven metrics, which illustrate the broader role of density in kernel-based unsupervised and semi-supervised machine learning. Computationally, we relate PWSPDs on complete weighted graphs to their analogues on weighted nearest neighbor graphs, providing high probability guarantees on their equivalence that are near-optimal. Connections with percolation theory are developed to establish estimates on the bias and variance of PWSPDs in the finite sample setting. The theoretical results are bolstered by illustrative experiments, demonstrating the versatility of PWSPDs for a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
