Revised Conditional t-SNE: Looking Beyond the Nearest Neighbors
Edith Heiter, Bo Kang, Ruth Seurinck, Jefrey Lijffijt

TL;DR
This paper introduces a revised version of conditional t-SNE that conditions on high-dimensional similarities rather than low-dimensional ones, improving scalability and embedding quality in many scenarios, especially when data is well clustered.
Contribution
The authors propose a new method for conditional t-SNE that conditions on high-dimensional similarities and separates within- and across-label neighbors, enhancing scalability and performance.
Findings
Revised ct-SNE improves embedding quality on synthetic data.
The method offers better scalability with recent t-SNE speedups.
Performance on real data with batch effects varies, indicating open challenges.
Abstract
Conditional t-SNE (ct-SNE) is a recent extension to t-SNE that allows removal of known cluster information from the embedding, to obtain a visualization revealing structure beyond label information. This is useful, for example, when one wants to factor out unwanted differences between a set of classes. We show that ct-SNE fails in many realistic settings, namely if the data is well clustered over the labels in the original high-dimensional space. We introduce a revised method by conditioning the high-dimensional similarities instead of the low-dimensional similarities and storing within- and across-label nearest neighbors separately. This also enables the use of recently proposed speedups for t-SNE, improving the scalability. From experiments on synthetic data, we find that our proposed method resolves the considered problems and improves the embedding quality. On real data containing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Data-Driven Disease Surveillance · Data Mining Algorithms and Applications
