Fast Clustering of Categorical Big Data
Bipana Thapaliya, Yu Zhuang

TL;DR
This paper introduces BK-Modes, a bisecting approach to improve initial cluster centers for K-Modes, resulting in better clustering quality and efficiency for large categorical datasets.
Contribution
The paper proposes BK-Modes, a novel bisecting method for selecting initial centers in K-Modes, enhancing clustering performance on big data.
Findings
BK-Modes outperforms existing methods in clustering quality.
BK-Modes is more efficient for large datasets.
Experimental results show improved performance in both quality and speed.
Abstract
The K-Modes algorithm, developed for clustering categorical data, is of high algorithmic simplicity but suffers from unreliable performances in clustering quality and clustering efficiency, both heavily influenced by the choice of initial cluster centers. In this paper, we investigate Bisecting K-Modes (BK-Modes), a successive bisecting process to find clusters, in examining how good the cluster centers out of the bisecting process will be when used as initial centers for the K-Modes. The BK-Modes works by splitting a dataset into multiple clusters iteratively with one cluster being chosen and bisected into two clusters in each iteration. We use the sum of distances of data to their cluster centers as the selection metric to choose a cluster to be bisected in each iteration. This iterative process stops when K clusters are produced. The centers of these K clusters are then used as the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research
