High Dimensional Cluster Analysis Using Path Lengths
Kevin McIlhany, Stephen Wiggins

TL;DR
This paper introduces a hierarchical clustering scheme for high-dimensional data using path lengths, spectral methods, and a novel Line-Of-Sight algorithm, evaluated on diverse datasets to improve robustness and accuracy.
Contribution
It presents new clustering techniques based on path lengths and a Line-Of-Sight algorithm, enhancing high-dimensional data analysis beyond existing methods.
Findings
Path length-based clustering improves high-dimensional clustering accuracy.
The Line-Of-Sight algorithm offers a novel approach for data partitioning.
Consensus-based techniques increase robustness across datasets.
Abstract
A hierarchical scheme for clustering data is presented which applies to spaces with a high number of dimension (). The data set is first reduced to a smaller set of partitions (multi-dimensional bins). Multiple clustering techniques are used, including spectral clustering, however, new techniques are also introduced based on the path length between partitions that are connected to one another. A Line-Of-Sight algorithm is also developed for clustering. A test bank of 12 data sets with varying properties is used to expose the strengths and weaknesses of each technique. Finally, a robust clustering technique is discussed based on reaching a consensus among the multiple approaches, overcoming the weaknesses found individually.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
