Visualizing the Finer Cluster Structure of Large-Scale and High-Dimensional Data
Yu Liang, Arin Chaudhuri, and Haoyu Wang

TL;DR
This paper introduces a generalized sigmoid function for high-dimensional data visualization, allowing adjustable focus on cluster structures, and demonstrates its effectiveness comparable to UMAP on large datasets.
Contribution
The paper proposes a novel generalized sigmoid function with a tunable parameter to enhance data visualization and reveal finer cluster structures in high-dimensional data.
Findings
Effective visualization comparable to UMAP
Adjustable parameter b reveals finer clusters
Finer subclusters are meaningful according to domain knowledge
Abstract
Dimension reduction and visualization of high-dimensional data have become very important research topics because of the rapid growth of large databases in data science. In this paper, we propose using a generalized sigmoid function to model the distance similarity in both high- and low-dimensional spaces. In particular, the parameter b is introduced to the generalized sigmoid function in low-dimensional space, so that we can adjust the heaviness of the function tail by changing the value of b. Using both simulated and real-world data sets, we show that our proposed method can generate visualization results comparable to those of uniform manifold approximation and projection (UMAP), which is a newly developed manifold learning technique with fast running speed, better global structure, and scalability to massive data sets. In addition, according to the purpose of the study and the data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Face and Expression Recognition · Data Visualization and Analytics
