Cross-Entropy Clustering

Przemys{\l}aw Spurek; Jacek Tabor

arXiv:1210.5594·cs.IT·May 19, 2014

Cross-Entropy Clustering

Przemys{\l}aw Spurek, Jacek Tabor

PDF

TL;DR

This paper introduces Cross-Entropy Clustering (CEC), a method that automatically determines the number of clusters by removing non-informative groups and provides a simple criterion for cluster validity, with efficient algorithms especially for Gaussian models.

Contribution

The paper develops a new clustering framework that automatically identifies the optimal number of clusters and offers an efficient, affine-invariant approach based on cross-entropy, extending classical methods.

Findings

01

CEC automatically finds the optimal number of clusters.

02

The Gaussian CEC approach is affine invariant and tends to form ellipsoid-shaped clusters.

03

The method is computationally efficient and generalizes k-means as covariance approaches zero.

Abstract

We construct a cross-entropy clustering (CEC) theory which finds the optimal number of clusters by automatically removing groups which carry no information. Moreover, our theory gives simple and efficient criterion to verify cluster validity. Although CEC can be build on an arbitrary family of densities, in the most important case of Gaussian CEC: {\em -- the division into clusters is affine invariant; -- the clustering will have the tendency to divide the data into ellipsoid-type shapes; -- the approach is computationally efficient as we can apply Hartigan approach.} We study also with particular attention clustering based on the Spherical Gaussian densities and that of Gaussian densities with covariance $s \I$ . In the letter case we show that with $s$ converging to zero we obtain the classical k-means clustering.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.