Unsupervised Learning Under a General Semiparametric Clusterwise Elliptical Distribution: Efficient Estimation, Optimal Clustering, and Consistent Cluster Selection
Jen-Chieh Teng, Sheng-Hsin Fan, Chin-Tsang Chiang, Ming-Yueh Huang, Alvin Lim

TL;DR
This paper develops a semiparametric clustering method for elliptical distributions, providing efficient estimation, optimal clustering, and consistent cluster number selection, with demonstrated strong finite-sample performance.
Contribution
It introduces a novel semiparametric clusterwise elliptical model with guaranteed convergence and an information criterion for selecting the number of clusters.
Findings
Consistently recovers true clusters in simulations
Achieves asymptotic efficiency in estimation
Demonstrates strong practical performance in applications
Abstract
We introduce a general semiparametric clusterwise elliptical distribution to assess how latent cluster structure shapes continuous outcomes. Using a subjectwise representation, we first estimate cluster-specific mean vectors and a cluster-invariant scatter matrix by minimizing a weighted sum of squares criterion augmented with a separation penalty; we provide an initialization scheme and a computational algorithm with guaranteed convergence. This initial estimator consistently recovers the true clusters and seeds a second phase that alternates pseudo-maximum likelihood (or pseudo-maximum marginal likelihood) estimation with cluster reassignment, yielding asymptotic semiparametric efficiency and an optimal clustering that asymptotically maximizes the probability of correct membership. We also propose a semiparametric information criterion for selecting the number of clusters. Monte Carlo…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
