# Clustering with t-SNE, provably

**Authors:** George C. Linderman, Stefan Steinerberger

arXiv: 1706.02582 · 2017-06-09

## TL;DR

This paper provides a mathematical proof that t-SNE can reliably recover well-separated clusters during its early exaggeration phase, enhancing understanding and guiding parameter choices.

## Contribution

It offers the first rigorous analysis of t-SNE's cluster recovery capability and proposes new guidelines for setting key parameters based on the proof.

## Key findings

- t-SNE can recover well-separated clusters during early exaggeration
- New parameter setting rules improve embedding quality
- Connection established between t-SNE and spectral clustering methods

## Abstract

t-distributed Stochastic Neighborhood Embedding (t-SNE), a clustering and visualization method proposed by van der Maaten & Hinton in 2008, has rapidly become a standard tool in a number of natural sciences. Despite its overwhelming success, there is a distinct lack of mathematical foundations and the inner workings of the algorithm are not well understood. The purpose of this paper is to prove that t-SNE is able to recover well-separated clusters; more precisely, we prove that t-SNE in the `early exaggeration' phase, an optimization technique proposed by van der Maaten & Hinton (2008) and van der Maaten (2014), can be rigorously analyzed. As a byproduct, the proof suggests novel ways for setting the exaggeration parameter $\alpha$ and step size $h$. Numerical examples illustrate the effectiveness of these rules: in particular, the quality of embedding of topological structures (e.g. the swiss roll) improves. We also discuss a connection to spectral clustering methods.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1706.02582/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/1706.02582/full.md

## References

4 references — full list in the complete paper: https://tomesphere.com/paper/1706.02582/full.md

---
Source: https://tomesphere.com/paper/1706.02582