Navigating Perplexity: A linear relationship with the data set size in t-SNE embeddings
Martin Skrodzki, Nicolas F. Chaves-de-Plaza, Thomas H\"ollt, Elmar, Eisemann, Klaus Hildebrandt

TL;DR
This paper reveals a linear relationship between perplexity and dataset size in t-SNE embeddings, enabling more consistent visualizations across different data scales and guiding better hyperparameter choices.
Contribution
It uncovers a linear scaling rule for perplexity relative to dataset size, improving the understanding and application of t-SNE visualizations.
Findings
Embeddings are structurally consistent when perplexity scales linearly with data size.
Qualitative and quantitative results support the linear relationship.
Guides users in selecting perplexity for high-dimensional data visualization.
Abstract
Widely used pipelines for analyzing high-dimensional data utilize two-dimensional visualizations. These are created, for instance, via t-distributed stochastic neighbor embedding (t-SNE). A crucial element of the t-SNE embedding procedure is the perplexity hyperparameter. That is because the embedding structure varies when perplexity is changed. A suitable perplexity choice depends on the data set and the intended usage for the embedding. Therefore, perplexity is often chosen based on heuristics, intuition, and prior experience. This paper uncovers a linear relationship between perplexity and the data set size. Namely, we show that embeddings remain structurally consistent across data set samples when perplexity is adjusted accordingly. Qualitative and quantitative experimental results support these findings. This informs the visualization process, guiding the user when choosing a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Advanced Clustering Algorithms Research · Data Visualization and Analytics
