Model Assisted Variable Clustering: Minimax-optimal Recovery and   Algorithms

Florentina Bunea; Christophe Giraud; Xi Luo; Martin Royer; Nicolas; Verzelen

arXiv:1508.01939·stat.ME·December 14, 2018

Model Assisted Variable Clustering: Minimax-optimal Recovery and Algorithms

Florentina Bunea, Christophe Giraud, Xi Luo, Martin Royer, Nicolas, Verzelen

PDF

1 Repo

TL;DR

This paper introduces G-block covariance models for variable clustering, derives minimax separation thresholds, and develops algorithms COD and PECOK that achieve optimal recovery, with theoretical analysis and empirical validation.

Contribution

It proposes the G-block covariance model framework, develops two minimax-optimal algorithms for variable clustering, and provides the first statistical analysis of convex relaxation-based clustering methods.

Findings

01

COD and PECOK algorithms achieve minimax-optimal recovery thresholds.

02

Spectral clustering requires higher separation for exact recovery.

03

Extensive simulations and data analysis validate the methods.

Abstract

Model-based clustering defines population level clusters relative to a model that embeds notions of similarity. Algorithms tailored to such models yield estimated clusters with a clear statistical interpretation. We take this view here and introduce the class of G-block covariance models as a background model for variable clustering. In such models, two variables in a cluster are deemed similar if they have similar associations will all other variables. This can arise, for instance, when groups of variables are noise corrupted versions of the same latent factor. We quantify the difficulty of clustering data generated from a G-block covariance model in terms of cluster proximity, measured with respect to two related, but different, cluster separation metrics. We derive minimax cluster separation thresholds, which are the metric values below which no algorithm can recover the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

martinroyer/pecok
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.