TL;DR
This paper introduces G-block covariance models for variable clustering, derives minimax separation thresholds, and develops algorithms COD and PECOK that achieve optimal recovery, with theoretical analysis and empirical validation.
Contribution
It proposes the G-block covariance model framework, develops two minimax-optimal algorithms for variable clustering, and provides the first statistical analysis of convex relaxation-based clustering methods.
Findings
COD and PECOK algorithms achieve minimax-optimal recovery thresholds.
Spectral clustering requires higher separation for exact recovery.
Extensive simulations and data analysis validate the methods.
Abstract
Model-based clustering defines population level clusters relative to a model that embeds notions of similarity. Algorithms tailored to such models yield estimated clusters with a clear statistical interpretation. We take this view here and introduce the class of G-block covariance models as a background model for variable clustering. In such models, two variables in a cluster are deemed similar if they have similar associations will all other variables. This can arise, for instance, when groups of variables are noise corrupted versions of the same latent factor. We quantify the difficulty of clustering data generated from a G-block covariance model in terms of cluster proximity, measured with respect to two related, but different, cluster separation metrics. We derive minimax cluster separation thresholds, which are the metric values below which no algorithm can recover the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
