UMAP Is Spectral Clustering on the Fuzzy Nearest-Neighbor Graph
Yang Yang

TL;DR
This paper proves that UMAP performs spectral clustering on the fuzzy nearest-neighbor graph, providing a theoretical foundation that unifies it with contrastive learning and spectral methods.
Contribution
It establishes a formal connection between UMAP, spectral clustering, and contrastive learning, clarifying the theoretical basis of UMAP's effectiveness.
Findings
UMAP's stochastic optimization is a contrastive learning objective.
Contrastive learning on a similarity graph is equivalent to spectral clustering.
UMAP's spectral initialization computes the exact linear solution to the spectral problem.
Abstract
UMAP (Uniform Manifold Approximation and Projection) is among the most widely used algorithms for non linear dimensionality reduction and data visualisation. Despite its popularity, and despite being presented through the lens of algebraic topology, the exact relationship between UMAP and classical spectral methods has remained informal. In this work, we prove that UMAP performs spectral clustering on the fuzzy k nearest neighbour graph. Our proof proceeds in three steps: (1) we show that UMAP's stochastic optimisation with negative sampling is a contrastive learning objective on the similarity graph; (2) we invoke the result of HaoChen et al. [8], establishing that contrastive learning on a similarity graph is equivalent to spectral clustering; and (3) we verify that UMAP's spectral initialisation computes the exact linear solution to this spectral problem. The equivalence is exact for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopological and Geometric Data Analysis · Advanced Graph Neural Networks · Graph Theory and Algorithms
