bigMap: Big Data Mapping with Parallelized t-SNE
Joan Garriga, Frederic Bartumeus

TL;DR
bigMap presents an enhanced large-scale data clustering method combining a parallelized t-SNE, adaptive density estimation, and watershed segmentation, all integrated into an R package for effective visualization and analysis.
Contribution
It introduces a parallelized t-SNE implementation and a novel adaptive density estimation technique tailored for large datasets, improving clustering accuracy and scalability.
Findings
Significantly reduces t-SNE computational time for large datasets
Provides more accurate density estimates in low-dimensional embeddings
Enables effective clustering and visualization of big data
Abstract
We introduce an improved unsupervised clustering protocol specially suited for large-scale structured data. The protocol follows three steps: a dimensionality reduction of the data, a density estimation over the low dimensional representation of the data, and a final segmentation of the density landscape. For the dimensionality reduction step we introduce a parallelized implementation of the well-known t-Stochastic Neighbouring Embedding (t-SNE) algorithm that significantly alleviates some inherent limitations, while improving its suitability for large datasets. We also introduce a new adaptive Kernel Density Estimation particularly coupled with the t-SNE framework in order to get accurate density estimates out of the embedded data, and a variant of the rainfalling watershed algorithm to identify clusters within the density landscape. The whole mapping protocol is wrapped in the bigMap…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Bayesian Methods and Mixture Models · Gene expression and cancer classification
