2-D Embedding of Large and High-dimensional Data with Minimal Memory and Computational Time Requirements
Witold Dzwinel, Rafal Wcislo, Stan Matwin

TL;DR
This paper introduces ivhd, a highly efficient data embedding method for large, high-dimensional datasets that significantly reduces computational and memory requirements while maintaining high embedding quality, enabling interactive visualization.
Contribution
The paper presents ivhd, a novel data embedding algorithm that outperforms existing methods in efficiency and memory usage, suitable for large-scale high-dimensional data visualization.
Findings
ivhd reduces time complexity from O(M log M) to O(M)
It maintains high embedding quality comparable to state-of-the-art methods
Demonstrates robustness and efficiency on datasets like MNIST and RCV1
Abstract
In the advent of big data era, interactive visualization of large data sets consisting of M*10^5+ high-dimensional feature vectors of length N (N ~ 10^3+), is an indispensable tool for data exploratory analysis. The state-of-the-art data embedding (DE) methods of N-D data into 2-D (3-D) visually perceptible space (e.g., based on t-SNE concept) are too demanding computationally to be efficiently employed for interactive data analytics of large and high-dimensional datasets. Herein we present a simple method, ivhd (interactive visualization of high-dimensional data tool), which radically outperforms the modern data-embedding algorithms in both computational and memory loads, while retaining high quality of N-D data embedding in 2-D (3-D). We show that DE problem is equivalent to the nearest neighbor nn-graph visualization, where only indices of a few nearest neighbors of each data sample…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Advanced Clustering Algorithms Research · Complex Network Analysis Techniques
