A Novel Theoretical Analysis for Clustering Heteroscedastic Gaussian Data without Knowledge of the Number of Clusters
Dominique Pastor, Elsa Dupraz, Ismail Hbilou, Guillaume Ansel

TL;DR
This paper introduces CENTRE-X, a new clustering algorithm for heteroscedastic Gaussian data that does not require prior knowledge of the number of clusters, leveraging a novel theoretical analysis and the Wald kernel.
Contribution
It provides a theoretical framework for clustering heteroscedastic Gaussian data without knowing the number of clusters, and introduces the CENTRE-X algorithm and Wald kernel.
Findings
CENTRE-X performs comparably or better than K-means and Mean-Shift.
The Wald kernel scales better with data dimension than Gaussian kernels.
Theoretical results guarantee fixed-points approximate cluster centroids under certain conditions.
Abstract
This paper addresses the problem of clustering measurement vectors that are heteroscedastic in that they can have different covariance matrices. From the assumption that the measurement vectors within a given cluster are Gaussian distributed with possibly different and unknown covariant matrices around the cluster centroid, we introduce a novel cost function to estimate the centroids. The zeros of the gradient of this cost function turn out to be the fixed-points of a certain function. As such, the approach generalizes the methodology employed to derive the existing Mean-Shift algorithm. But as a main and novel theoretical result compared to Mean-Shift, this paper shows that the sole fixed-points of the identified function tend to be the cluster centroids if both the number of measurements per cluster and the distances between centroids are large enough. As a second contribution, this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
