A Surprisingly Simple Method for Distributed Euclidean-Minimum Spanning Tree / Single Linkage Dendrogram Construction from High Dimensional Embeddings via Distance Decomposition
Richard Lettich

TL;DR
The paper presents a simple decomposition method for efficiently computing exact Euclidean Minimum Spanning Trees in high-dimensional spaces, enabling scalable clustering and dendrogram construction from neural network embeddings.
Contribution
It introduces a novel decomposition approach for distributed calculation of geometric minimum spanning trees in high dimensions, addressing limitations of existing low-dimensional algorithms.
Findings
Enables exact MST computation in high-dimensional embeddings
Facilitates efficient construction of single linkage dendrograms
Applicable to neural network embedding clustering
Abstract
We introduce a decomposition method for the distributed calculation of exact Euclidean Minimum Spanning Trees in high dimensions (where sub-quadratic algorithms are not effective), or more generalized geometric-minimum spanning trees of complete graphs, where for each vertex in the graph is represented by a vector in , and each for any edge, the the weight of the edge in the graph is given by a symmetric binary `distance' function between the representative vectors . This is motivated by the task of clustering high dimensional embeddings produced by neural networks, where low-dimensional algorithms are ineffective; such geometric-minimum spanning trees find applications as a subroutine in the construction of single linkage dendrograms, as the two structures can be converted between each other efficiently.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications
