Towards a comprehensive visualization of structure in data
Joan Garriga, Frederic Bartumeus

TL;DR
This paper introduces a simplified, scalable t-SNE-based method with a novel parallelization protocol that enhances visualization of large data structures across multiple scales, balancing local and global data insights.
Contribution
The authors present pt-SNE, a parallelized t-SNE variant with a single parameter and a chunk extasciandmix protocol, enabling efficient multi-scale data visualization.
Findings
pt-SNE converges to global embeddings comparable to state-of-the-art methods.
Chunk extasciandmix protocol adds minimal noise, slightly reducing local accuracy.
Post-processing restores local scale visualization without losing global precision.
Abstract
Dimensional data reduction methods are fundamental to explore and visualize large data sets. Basic requirements for unsupervised data exploration are simplicity, flexibility and scalability. However, current methods show complex parameterizations and strong computational limitations when exploring large data structures across scales. Here, we focus on the t-SNE algorithm and show that a simplified parameter setup with a single control parameter, namely the perplexity, can effectively balance local and global data structure visualization. We also designed a chunk\&mix protocol to efficiently parallelize t-SNE and explore data structure across a much wide range of scales than currently available. Our parallel version of the BH-tSNE, namely pt-SNE, converges to good global embedding, comparable to state-of-the-art solutions, though the chunk\&mix protocol adds little noise and decreases…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Single-cell and spatial transcriptomics · Data Stream Mining Techniques
