Deep Clustering of Tabular Data by Weighted Gaussian Distribution Learning
Shourav B. Rabbani, Ivan V. Medri, Manar D. Samad

TL;DR
This paper introduces G-CEALS, a novel deep clustering method for tabular data that models Gaussian distributions in autoencoder latent space, outperforming traditional clustering techniques.
Contribution
It presents one of the first deep clustering frameworks specifically designed for tabular data, addressing unique representation challenges.
Findings
G-CEALS outperforms nine state-of-the-art clustering methods.
Achieves higher accuracy and ARI scores on sixteen tabular datasets.
Significantly improves clustering performance over K-means and GMM.
Abstract
Deep learning methods are primarily proposed for supervised learning of images or text with limited applications to clustering problems. In contrast, tabular data with heterogeneous features pose unique challenges in representation learning, where deep learning has yet to replace traditional machine learning. This paper addresses these challenges in developing one of the first deep clustering methods for tabular data: Gaussian Cluster Embedding in Autoencoder Latent Space (G-CEALS). G-CEALS is an unsupervised deep clustering framework for learning the parameters of multivariate Gaussian cluster distributions by iteratively updating individual cluster weights. The G-CEALS method presents average rank orderings of 2.9(1.7) and 2.8(1.7) based on clustering accuracy and adjusted Rand index (ARI) scores on sixteen tabular data sets, respectively, and outperforms nine state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Remote-Sensing Image Classification · Face and Expression Recognition
MethodsTest
