
TL;DR
This paper introduces Cross-Entropy Clustering (CEC), a method that automatically determines the number of clusters by removing non-informative groups and provides a simple criterion for cluster validity, with efficient algorithms especially for Gaussian models.
Contribution
The paper develops a new clustering framework that automatically identifies the optimal number of clusters and offers an efficient, affine-invariant approach based on cross-entropy, extending classical methods.
Findings
CEC automatically finds the optimal number of clusters.
The Gaussian CEC approach is affine invariant and tends to form ellipsoid-shaped clusters.
The method is computationally efficient and generalizes k-means as covariance approaches zero.
Abstract
We construct a cross-entropy clustering (CEC) theory which finds the optimal number of clusters by automatically removing groups which carry no information. Moreover, our theory gives simple and efficient criterion to verify cluster validity. Although CEC can be build on an arbitrary family of densities, in the most important case of Gaussian CEC: {\em -- the division into clusters is affine invariant; -- the clustering will have the tendency to divide the data into ellipsoid-type shapes; -- the approach is computationally efficient as we can apply Hartigan approach.} We study also with particular attention clustering based on the Spherical Gaussian densities and that of Gaussian densities with covariance . In the letter case we show that with converging to zero we obtain the classical k-means clustering.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
