Spectral Clustering of Categorical and Mixed-type Data via Extra Graph Nodes
Dylan Soemitro, Jeova Farias Sales Rocha Neto

TL;DR
This paper introduces a novel spectral clustering method that incorporates categorical and mixed data types by adding extra nodes, resulting in an interpretable, efficient, and competitive clustering algorithm without complex preprocessing.
Contribution
The paper proposes a new spectral clustering framework using extra nodes for categories, enabling natural handling of mixed data types and linear-time clustering for categorical data.
Findings
The method achieves competitive clustering performance.
It operates in linear time for categorical data.
It avoids complex data preprocessing steps.
Abstract
Clustering data objects into homogeneous groups is one of the most important tasks in data mining. Spectral clustering is arguably one of the most important algorithms for clustering, as it is appealing for its theoretical soundness and is adaptable to many real-world data settings. For example, mixed data, where the data is composed of numerical and categorical features, is typically handled via numerical discretization, dummy coding, or similarity computation that takes into account both data types. This paper explores a more natural way to incorporate both numerical and categorical information into the spectral clustering algorithm, avoiding the need for data preprocessing or the use of sophisticated similarity functions. We propose adding extra nodes corresponding to the different categories the data may belong to and show that it leads to an interpretable clustering objective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Advanced Clustering Algorithms Research
MethodsSpectral Clustering
