Clustering Mixed Datasets Using Homogeneity Analysis with Applications to Big Data
Rajiv Sambasivan, Sourish Das

TL;DR
This paper explores using homogeneity analysis to cluster datasets with mixed numerical and categorical data, enabling the application of Euclidean-based tools for big data analysis.
Contribution
It introduces a method to represent mixed datasets in Euclidean space via homogeneity analysis, facilitating clustering and analysis.
Findings
Effective clustering of mixed datasets demonstrated
Applicable to large-scale big data scenarios
Enables use of Euclidean tools for mixed data analysis
Abstract
Datasets with a mixture of numerical and categorical attributes are routinely encountered in many application domains. In this work we examine an approach to clustering such datasets using homogeneity analysis. Homogeneity analysis determines a euclidean representation of the data. This can be analyzed by leveraging the large body of tools and techniques for data with a euclidean representation. Experiments conducted as part of this study suggest that this approach can be useful in the analysis and exploration of big datasets with a mixture of numerical and categorical attributes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
