Hierarchical Clustering via Sketches and Hierarchical Correlation Clustering
Danny Vainstein, Vaggos Chatziafratis, Gui Citovsky, Anand, Rajagopalan, Mohammad Mahdian, Yossi Azar

TL;DR
This paper advances hierarchical clustering by providing structural lemmas that enable near-optimal approximation algorithms for similarity and dissimilarity objectives, and introduces Hierarchical Correlation Clustering for mixed data types.
Contribution
It introduces structural lemmas that improve approximation guarantees for hierarchical clustering objectives and proposes Hierarchical Correlation Clustering for combined similarity and dissimilarity data.
Findings
Approximation ratios close to 1 for similarity/dissimilarity objectives under certain conditions.
New structural lemmas reducing HC trees to simpler forms with minimal loss.
Approximation of 0.4767 for Hierarchical Correlation Clustering.
Abstract
Recently, Hierarchical Clustering (HC) has been considered through the lens of optimization. In particular, two maximization objectives have been defined. Moseley and Wang defined the \emph{Revenue} objective to handle similarity information given by a weighted graph on the data points (w.l.o.g., weights), while Cohen-Addad et al. defined the \emph{Dissimilarity} objective to handle dissimilarity information. In this paper, we prove structural lemmas for both objectives allowing us to convert any HC tree to a tree with constant number of internal nodes while incurring an arbitrarily small loss in each objective. Although the best-known approximations are 0.585 and 0.667 respectively, using our lemmas we obtain approximations arbitrarily close to 1, if not all weights are small (i.e., there exist constants such that the fraction of weights smaller than…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Data Management and Algorithms · Face and Expression Recognition
