Visualization of Very Large High-Dimensional Data Sets as Minimum Spanning Trees
Daniel Probst, Jean-Louis Reymond

TL;DR
This paper introduces TMAP, a novel visualization method that efficiently represents large, high-dimensional chemical and biological data sets as two-dimensional trees, enhancing interpretability over existing methods like t-SNE and UMAP.
Contribution
The paper presents TMAP, a new scalable visualization algorithm capable of handling millions of high-dimensional data points as trees, improving global and local structure preservation.
Findings
TMAP can visualize datasets with millions of points.
TMAP outperforms t-SNE and UMAP in structure preservation.
TMAP is broadly applicable across scientific domains.
Abstract
The chemical sciences are producing an unprecedented amount of large, high-dimensional data sets containing chemical structures and associated properties. However, there are currently no algorithms to visualize such data while preserving both global and local features with a sufficient level of detail to allow for human inspection and interpretation. Here, we propose a solution to this problem with a new data visualization method, TMAP, capable of representing data sets of up to millions of data points and arbitrary high dimensionality as a two-dimensional tree (http://tmap.gdb.tools). Visualizations based on TMAP are better suited than t-SNE or UMAP for the exploration and interpretation of large data sets due to their tree-like nature, increased local and global neighborhood and structure preservation, and the transparency of the methods the algorithm is based on. We apply TMAP to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Data Visualization and Analytics · Bioinformatics and Genomic Networks
