K-Histograms: An Efficient Clustering Algorithm for Categorical Dataset

Zengyou He; Xiaofei Xu; Shengchun Deng; Bin Dong

arXiv:cs/0509033·cs.AI·May 23, 2007·31 cites

K-Histograms: An Efficient Clustering Algorithm for Categorical Dataset

Zengyou He, Xiaofei Xu, Shengchun Deng, Bin Dong

PDF

Open Access

TL;DR

This paper introduces k-histogram, an efficient clustering algorithm for categorical data that extends k-means by using histograms, demonstrating improved results over existing methods like k-modes.

Contribution

The paper presents a novel histogram-based extension of k-means for categorical data clustering, enhancing efficiency and accuracy.

Findings

01

k-histogram outperforms k-modes in clustering quality

02

The algorithm dynamically updates histograms during clustering

03

Experimental results on real datasets validate effectiveness

Abstract

Clustering categorical data is an integral part of data mining and has attracted much attention recently. In this paper, we present k-histogram, a new efficient algorithm for clustering categorical data. The k-histogram algorithm extends the k-means algorithm to categorical domain by replacing the means of clusters with histograms, and dynamically updates histograms in the clustering process. Experimental results on real datasets show that k-histogram algorithm can produce better clustering results than k-modes algorithm, the one related with our work most closely.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Face and Expression Recognition · Data Mining Algorithms and Applications