Similarity-based Distance for Categorical Clustering using Space Structure
Utkarsh Nath, Shikha Asrani, Rahul Katarya

TL;DR
This paper introduces a new similarity-based distance metric for categorical data clustering, demonstrating improved performance over existing methods like k-modes through experiments.
Contribution
The paper proposes a novel similarity-based distance metric for categorical data, enhancing clustering accuracy in space structure-based algorithms.
Findings
SBD significantly outperforms k-modes in experiments.
SBD improves clustering accuracy on categorical datasets.
Proposed method enhances space structure-based clustering results.
Abstract
Clustering is spotting pattern in a group of objects and resultantly grouping the similar objects together. Objects have attributes which are not always numerical, sometimes attributes have domain or categories to which they could belong to. Such data is called categorical data. To group categorical data many clustering algorithms are used, among which k- modes algorithm has so far given the most significant results. Nevertheless, there is still a lot which could be improved. Algorithms like k-means, fuzzy-c-means or hierarchical have given far better accuracies with numerical data. In this paper, we have proposed a novel distance metric, similarity-based distance (SBD) to find the distance between objects of categorical data. Experiments have shown that our proposed distance (SBD), when used with the SBC (space structure based clustering) type algorithm significantly outperforms the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Advanced Clustering Algorithms Research · Face and Expression Recognition
