Approximating Dasgupta Cost in Sublinear Time from a Few Random Seeds
Michael Kapralov, Akash Kumar, Silvio Lattanzi, Aida Mousavifar, Weronika Wrzos-Kaminska

TL;DR
This paper introduces a sublinear time algorithm that approximates the hierarchical clustering cost of $(k, \epsilon)$-clusterable graphs using a small number of seed vertices, advancing the understanding of graph structure testing.
Contribution
It presents the first sublinear time algorithm for approximating Dasgupta cost in clusterable graphs using random seeds, bridging a gap in hierarchical clustering analysis.
Findings
Achieves an $O(\sqrt{\log k})$ approximation of Dasgupta cost.
Operates in approximately $n^{1/2+O(\epsilon)}$ time with about $n^{1/3}$ seeds.
Provides a sublinear time simulation of existing clustering algorithms on clusterable graphs.
Abstract
Testing graph cluster structure has been a central object of study in property testing since the foundational work of Goldreich and Ron [STOC'96] on expansion testing, i.e. the problem of distinguishing between a single cluster (an expander) and a graph that is far from a single cluster. More generally, a -clusterable graph is a graph whose vertex set admits a partition into induced expanders, each with outer conductance bounded by . A recent line of work initiated by Czumaj, Peng and Sohler [STOC'15] has shown how to test whether a graph is close to -clusterable, and to locally determine which cluster a given vertex belongs to with misclassification rate , but no sublinear time algorithms for learning the structure of inter-cluster connections are known. As a simple example, can one locally distinguish between the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
