Understanding How Dimension Reduction Tools Work: An Empirical Approach to Deciphering t-SNE, UMAP, TriMAP, and PaCMAP for Data Visualization
Yingfan Wang, Haiyang Huang, Cynthia Rudin, Yaron Shaposhnik

TL;DR
This paper empirically analyzes popular dimension reduction techniques like t-SNE, UMAP, TriMAP, and introduces PaCMAP, a new method that better preserves both local and global data structures for visualization.
Contribution
It provides a detailed understanding of the mechanisms behind DR methods and introduces PaCMAP, a novel algorithm that balances local and global structure preservation.
Findings
Insights into the importance of preserving local vs. global structure
Design principles for DR loss functions based on empirical analysis
PaCMAP effectively preserves both local and global data structures
Abstract
Dimension reduction (DR) techniques such as t-SNE, UMAP, and TriMAP have demonstrated impressive visualization performance on many real world datasets. One tension that has always faced these methods is the trade-off between preservation of global structure and preservation of local structure: these methods can either handle one or the other, but not both. In this work, our main goal is to understand what aspects of DR methods are important for preserving both local and global structure: it is difficult to design a better method without a true understanding of the choices we make in our algorithms and their empirical impact on the lower-dimensional embeddings they produce. Towards the goal of local structure preservation, we provide several useful design principles for DR loss functions based on our new understanding of the mechanisms behind successful DR methods. Towards the goal of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Generative Adversarial Networks and Image Synthesis · Machine Learning and Data Classification
