Stars: Tera-Scale Graph Building for Clustering and Graph Learning
CJ Carey, Jonathan Halcrow, Rajesh Jayaram, Vahab Mirrokni, Warren, Schudy, Peilin Zhong

TL;DR
Stars is a scalable method for constructing extremely sparse similarity graphs at tera-scale, enabling efficient clustering and graph learning with significant reductions in comparisons and runtime.
Contribution
The paper introduces Stars, a novel two-hop spanner-based graph construction method that is highly scalable and reduces similarity comparisons for large datasets.
Findings
Constructs graphs with tens of trillions of edges at tera-scale.
Achieves 10-1000x reduction in similarity comparisons.
Provides 2-10x faster graph building without quality loss.
Abstract
A fundamental procedure in the analysis of massive datasets is the construction of similarity graphs. Such graphs play a key role for many downstream tasks, including clustering, classification, graph learning, and nearest neighbor search. For these tasks, it is critical to build graphs which are sparse yet still representative of the underlying data. The benefits of sparsity are twofold: firstly, constructing dense graphs is infeasible in practice for large datasets, and secondly, the runtime of downstream tasks is directly influenced by the sparsity of the similarity graph. In this work, we present : a highly scalable method for building extremely sparse graphs via two-hop spanners, which are graphs where similar points are connected by a path of length at most two. Stars can construct two-hop spanners with significantly fewer similarity comparisons, which are a major…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Complex Network Analysis Techniques · Caching and Content Delivery
