Attraction-Repulsion Spectrum in Neighbor Embeddings
Jan Niklas B\"ohm, Philipp Berens, Dmitry Kobak

TL;DR
This paper explores how adjusting the balance between attraction and repulsion in neighbor embedding algorithms like t-SNE, UMAP, and ForceAtlas2 affects their ability to represent data structures, revealing a spectrum of embeddings with different properties.
Contribution
It introduces the attraction-repulsion spectrum in neighbor embeddings and explains how different algorithms correspond to different points on this spectrum, highlighting inherent trade-offs.
Findings
Stronger attraction improves continuous manifold representation.
Stronger repulsion enhances cluster separation and kNN recall.
UMAP and ForceAtlas2 correspond to increased attraction compared to t-SNE.
Abstract
Neighbor embeddings are a family of methods for visualizing complex high-dimensional datasets using NN graphs. To find the low-dimensional embedding, these algorithms combine an attractive force between neighboring pairs of points with a repulsive force between all points. One of the most popular examples of such algorithms is t-SNE. Here we empirically show that changing the balance between the attractive and the repulsive forces in t-SNE using the exaggeration parameter yields a spectrum of embeddings, which is characterized by a simple trade-off: stronger attraction can better represent continuous manifold structures, while stronger repulsion can better represent discrete cluster structures and yields higher NN recall. We find that UMAP embeddings correspond to t-SNE with increased attraction; mathematical analysis shows that this is because the negative sampling optimisation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSingle-cell and spatial transcriptomics · Gene expression and cancer classification · Bioinformatics and Genomic Networks
