On the continuum limit of t-SNE for data visualization

Jeff Calder; Zhonggan Huang; Ryan Murray; Adam Pickarski

arXiv:2604.12041·stat.ML·April 15, 2026

On the continuum limit of t-SNE for data visualization

Jeff Calder, Zhonggan Huang, Ryan Murray, Adam Pickarski

PDF

TL;DR

This paper investigates the theoretical continuum limit of t-SNE, revealing its connection to a non-convex variational problem and explaining its empirical data separation capabilities.

Contribution

It provides the first rigorous analysis of t-SNE's continuum limit, linking it to a variational problem with non-convex regularization and exploring its well-posedness.

Findings

01

Asymptotic consistency of t-SNE divergence as data points grow large

02

Existence of a unique smooth minimizer in 1D cases

03

Alignment of t-SNE behavior with the ill-posed Perona-Malik equation

Abstract

This work is concerned with the continuum limit of a graph-based data visualization technique called the t-Distributed Stochastic Neighbor Embedding (t-SNE), which is widely used for visualizing data in a variety of applications, but is still poorly understood from a theoretical standpoint. The t-SNE algorithm produces visualizations by minimizing the Kullback-Leibler divergence between similarity matrices representing the high dimensional data and its low dimensional representation. We prove that as the number of data points $n \to \infty$ , after a natural rescaling and in applicable parameter regimes, the Kullback-Leibler divergence is consistent as the number of data points $n \to \infty$ and the similarity graph remains sparse with a continuum variational problem that involves a non-convex gradient regularization term and a penalty on the magnitude of the probability density…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.