Data Segmentation via t-SNE, DBSCAN, and Random Forest
Timothy DeLise

TL;DR
This paper introduces a data segmentation method combining t-SNE, DBSCAN, and Random Forest to identify natural clusters, characterize them, and generalize to new data, demonstrated on multiple datasets.
Contribution
It presents an end-to-end clustering pipeline that integrates dimensionality reduction, density-based clustering, and classification, with application to diverse real-world datasets.
Findings
Effective clustering on Iris, MNIST, and Instagram data
Generalizes well to out-of-sample data
Provides meaningful cluster profiles
Abstract
This research proposes a data segmentation algorithm which combines t-SNE, DBSCAN, and Random Forest classifier to form an end-to-end pipeline that separates data into natural clusters and produces a characteristic profile of each cluster based on the most important features. Out-of-sample cluster labels can be inferred, and the technique generalizes well on real data sets. We describe the algorithm and provide case studies using the Iris and MNIST data sets, as well as real social media site data from Instagram. This is a proof of concept and sets the stage for further in-depth theoretical analysis.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research
