K-Histograms: An Efficient Clustering Algorithm for Categorical Dataset
Zengyou He, Xiaofei Xu, Shengchun Deng, Bin Dong

TL;DR
This paper introduces k-histogram, an efficient clustering algorithm for categorical data that extends k-means by using histograms, demonstrating improved results over existing methods like k-modes.
Contribution
The paper presents a novel histogram-based extension of k-means for categorical data clustering, enhancing efficiency and accuracy.
Findings
k-histogram outperforms k-modes in clustering quality
The algorithm dynamically updates histograms during clustering
Experimental results on real datasets validate effectiveness
Abstract
Clustering categorical data is an integral part of data mining and has attracted much attention recently. In this paper, we present k-histogram, a new efficient algorithm for clustering categorical data. The k-histogram algorithm extends the k-means algorithm to categorical domain by replacing the means of clusters with histograms, and dynamically updates histograms in the clustering process. Experimental results on real datasets show that k-histogram algorithm can produce better clustering results than k-modes algorithm, the one related with our work most closely.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Face and Expression Recognition · Data Mining Algorithms and Applications
