Hierarchical Clustering using Randomly Selected Similarities
Brian Eriksson

TL;DR
This paper demonstrates that hierarchical clustering can be effectively reconstructed using only a small, randomly selected subset of pairwise similarities, significantly reducing data requirements.
Contribution
It introduces a method for reconstructing hierarchical clustering from at-random similarity observations with theoretical bounds on the number of similarities needed.
Findings
Hierarchical clustering can be recovered with O(N log N) similarities.
A significant fraction of the clustering is recoverable with fewer similarities.
The approach applies to scenarios where similarities are observed randomly.
Abstract
The problem of hierarchical clustering items from pairwise similarities is found across various scientific disciplines, from biology to networking. Often, applications of clustering techniques are limited by the cost of obtaining similarities between pairs of items. While prior work has been developed to reconstruct clustering using a significantly reduced set of pairwise similarities via adaptive measurements, these techniques are only applicable when choice of similarities are available to the user. In this paper, we examine reconstructing hierarchical clustering under similarity observations at-random. We derive precise bounds which show that a significant fraction of the hierarchical clustering can be recovered using fewer than all the pairwise similarities. We find that the correct hierarchical clustering down to a constant fraction of the total number of items (i.e., clusters…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Data Management and Algorithms · Advanced Clustering Algorithms Research
