Hierarchical Clustering using Randomly Selected Similarities

Brian Eriksson

arXiv:1207.4748·stat.ML·July 20, 2012

Hierarchical Clustering using Randomly Selected Similarities

Brian Eriksson

PDF

Open Access

TL;DR

This paper demonstrates that hierarchical clustering can be effectively reconstructed using only a small, randomly selected subset of pairwise similarities, significantly reducing data requirements.

Contribution

It introduces a method for reconstructing hierarchical clustering from at-random similarity observations with theoretical bounds on the number of similarities needed.

Findings

01

Hierarchical clustering can be recovered with O(N log N) similarities.

02

A significant fraction of the clustering is recoverable with fewer similarities.

03

The approach applies to scenarios where similarities are observed randomly.

Abstract

The problem of hierarchical clustering items from pairwise similarities is found across various scientific disciplines, from biology to networking. Often, applications of clustering techniques are limited by the cost of obtaining similarities between pairs of items. While prior work has been developed to reconstruct clustering using a significantly reduced set of pairwise similarities via adaptive measurements, these techniques are only applicable when choice of similarities are available to the user. In this paper, we examine reconstructing hierarchical clustering under similarity observations at-random. We derive precise bounds which show that a significant fraction of the hierarchical clustering can be recovered using fewer than all the pairwise similarities. We find that the correct hierarchical clustering down to a constant fraction of the total number of items (i.e., clusters…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComplex Network Analysis Techniques · Data Management and Algorithms · Advanced Clustering Algorithms Research