Semiparametric Elliptical Mixture Clustering for High-Dimensional Data
Long Feng, Dan Zhuang

TL;DR
This paper introduces a semiparametric elliptical mixture clustering method for high-dimensional data that handles heavy tails and elliptical distributions without fully parametric assumptions, using a GEM algorithm.
Contribution
It develops a novel high-dimensional clustering framework with a data-driven cluster number selection and a GEM algorithm that avoids specifying a parametric radial family.
Findings
Method is computationally feasible in high dimensions.
Establishes high-dimensional consistency for model components.
Demonstrates robustness and competitive performance in simulations and digit data.
Abstract
Clustering high-dimensional data is especially challenging when cluster distributions are heavy tailed and only approximately elliptical. Existing high-dimensional methods are largely built for Gaussian or other light-tailed models, whereas classical robust elliptical procedures are mostly low dimensional or rely on fully parametric radial families. We propose a semiparametric elliptical mixture clustering framework with cluster-specific centers, an unknown common radial generator, and a common sparse precision-shape matrix, together with a data-driven rule for selecting the number of clusters. A generalized expectation-maximization (GEM) algorithm is developed by combining transformed-radius estimation of the radial generator, radial-score center updates, and a Tyler-POET-GLASSO update for the common precision-shape matrix. The method avoids specifying a parametric radial family and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
