TableDC: Deep Clustering for Tabular Data
Hafiz Tayyab Rauf, Andre Freitas, Norman W. Paton

TL;DR
TableDC introduces a deep clustering algorithm tailored for tabular data, leveraging Mahalanobis distance and heavy-tailed distributions to improve clustering accuracy, robustness, and scalability in data management tasks.
Contribution
The paper presents a novel deep clustering method for tabular data that effectively handles overlapping clusters, outliers, and large cluster numbers, tailored for data management applications.
Findings
Outperforms existing deep clustering and standard methods on benchmark datasets.
Provides higher tolerance to outliers and overlapping clusters.
Scales efficiently with large numbers of clusters.
Abstract
Deep clustering (DC), a fusion of deep representation learning and clustering, has recently demonstrated positive results in data science, particularly text processing and computer vision. However, joint optimization of feature learning and data distribution in the multi-dimensional space is domain-specific, so existing DC methods struggle to generalize to other application domains (such as data integration and cleaning). In data management tasks, where high-density embeddings and overlapping clusters dominate, a data management-specific DC algorithm should be able to interact better with the data properties for supporting data cleaning and integration tasks. This paper presents a deep clustering algorithm for tabular data (TableDC) that reflects the properties of data management applications, particularly schema inference, entity resolution, and domain discovery. To address overlapping…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Advanced Clustering Algorithms Research · Anomaly Detection Techniques and Applications
