FastUMAP: Scalable Dimensionality Reduction via Bipartite Landmark Sampling
Hongmin Li

TL;DR
FastUMAP is a landmark-based, scalable dimensionality reduction method optimized for repeated exploratory analysis, significantly reducing runtime while maintaining competitive accuracy.
Contribution
Introduces FastUMAP, a landmark sampling approach that accelerates UMAP embeddings for repeated use cases with adjustable fidelity-runtime trade-offs.
Findings
FastUMAP achieves the lowest runtime on 7 out of 9 benchmark datasets.
On MNIST and Fashion-MNIST, it runs in about 4.6 seconds, much faster than Barnes--Hut t-SNE.
It maintains high accuracy, with 91.4% mean kNN accuracy on large datasets.
Abstract
Exploratory analysis of high-dimensional data rarely stops at a single embedding. In practice, analysts rerun dimensionality reduction after changing preprocessing, subsets, or hyperparameters, and standard nonlinear methods can quickly become the bottleneck. We introduce FastUMAP (Bipartite Manifold Approximation and Projection), a landmark-based method designed for this repeated-use setting. FastUMAP builds a sparse point-landmark fuzzy graph, computes a Nystrom spectral warm start from the induced landmark affinity, and then refines all sample coordinates with a UMAP-style objective on the bipartite graph. The landmark ratio r = m/n provides a direct way to trade runtime against fidelity. On 9 benchmark datasets spanning 178 to 70,000 samples, FastUMAP has the lowest runtime on 7 datasets in our reported default-implementation comparison on one workstation. On MNIST and Fashion-MNIST…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
