A Cross Entropy test allows quantitative statistical comparison of t-SNE and UMAP representations
Carlos P. Roca1, Oliver T. Burton, Julika Neumann, Samar Tareen, Carly, E. Whyte, St\'ephanie Humblet-Baron, Adrian Liston

TL;DR
This paper introduces a statistical test based on cross entropy and the Kolmogorov-Smirnov test to quantitatively compare t-SNE and UMAP representations of high-dimensional biomedical data, enabling robust analysis beyond visualization.
Contribution
The authors develop a novel statistical method for comparing dimensionality-reduced datasets, addressing a gap in quantitative tools for t-SNE and UMAP outputs.
Findings
The test can distinguish biological variation from reduction artifacts.
It provides a valid distance metric for dataset comparison.
The method enables hierarchical clustering of multiple samples.
Abstract
The advent of high dimensional single cell data in the biomedical sciences has necessitated the development of dimensionality-reduction tools. t-SNE and UMAP are the two most frequently used approaches, allowing clear visualisation of highly complex single cell datasets. Despite the ubiquity of these approaches and the clear need for quantitative comparison of single cell datasets, t-SNE and UMAP have largely remained data visualisation tools due to the lack of robust statistical approaches available. Here, we have derived a statistical test for evaluating the difference between dimensionality-reduced datasets, using the Kolmogorov-Smirnov test on the distributions of cross entropy of single cells within each dataset. As the approach uses the interrelationship of single cells for comparison, the resulting statistic is robust and capable of distinguishing between true biological…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSingle-cell and spatial transcriptomics · Cell Image Analysis Techniques
