DiRe-RAPIDS: Topology-faithful dimensionality reduction at scale
Alexander Kolpakov, Igor Rivin

TL;DR
DiRe-RAPIDS is a scalable dimensionality reduction method that better preserves global topology and topological features of high-dimensional data compared to UMAP, especially at large scales.
Contribution
The paper introduces a topology-faithfulness benchmark and demonstrates DiRe-RAPIDS's superior ability to preserve topological structures at scale.
Findings
DiRe matches or exceeds GPU-accelerated UMAP in classification tasks.
DiRe recovers exact first Betti numbers on stress tests.
DiRe preserves 3-4 times more topological structure than UMAP on large datasets.
Abstract
Dimensionality reduction methods such as UMAP and t-SNE are central tools for visualising high-dimensional data, but their local-neighborhood objectives can preserve sampling noise while distorting global topology. We show that standard local metrics reward this noise memorisation: top-performing embeddings invent cycles and disconnected islands absent from the data. We introduce a topology-faithfulness benchmark based on noisy manifolds with known homology, tune DiRe against it, and find Pareto-optimal configurations that match or beat GPU-accelerated UMAP on classification while recovering exact first Betti numbers on stress tests. On 723K arXiv paper embeddings, DiRe preserves 3-4 times more topological structure than UMAP at comparable wall-clock.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
