On the continuum limit of t-SNE for data visualization
Jeff Calder, Zhonggan Huang, Ryan Murray, Adam Pickarski

TL;DR
This paper investigates the theoretical continuum limit of t-SNE, revealing its connection to a non-convex variational problem and explaining its empirical data separation capabilities.
Contribution
It provides the first rigorous analysis of t-SNE's continuum limit, linking it to a variational problem with non-convex regularization and exploring its well-posedness.
Findings
Asymptotic consistency of t-SNE divergence as data points grow large
Existence of a unique smooth minimizer in 1D cases
Alignment of t-SNE behavior with the ill-posed Perona-Malik equation
Abstract
This work is concerned with the continuum limit of a graph-based data visualization technique called the t-Distributed Stochastic Neighbor Embedding (t-SNE), which is widely used for visualizing data in a variety of applications, but is still poorly understood from a theoretical standpoint. The t-SNE algorithm produces visualizations by minimizing the Kullback-Leibler divergence between similarity matrices representing the high dimensional data and its low dimensional representation. We prove that as the number of data points , after a natural rescaling and in applicable parameter regimes, the Kullback-Leibler divergence is consistent as the number of data points and the similarity graph remains sparse with a continuum variational problem that involves a non-convex gradient regularization term and a penalty on the magnitude of the probability density…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
