Convex clustering via $\ell_1$ fusion penalization
Peter Radchenko, Gourab Mukherjee

TL;DR
This paper analyzes the asymptotic behavior of a convex clustering method with an $$ fusion penalty, establishing its consistency and convergence rates, and proposes a post-processing step for better cluster number estimation, validated through simulations and real data.
Contribution
It provides the first thorough asymptotic analysis of the convex clustering with $$ fusion penalty and introduces a post-processing modification for improved cluster number estimation.
Findings
The sample clustering procedure consistently estimates the population clustering.
The convergence rates of the estimator are derived.
The proposed method effectively estimates the number of clusters in practice.
Abstract
We study the large sample behavior of a convex clustering framework, which minimizes the sample within cluster sum of squares under an~ fusion constraint on the cluster centroids. This recently proposed approach has been gaining in popularity, however, its asymptotic properties have remained mostly unknown. Our analysis is based on a novel representation of the sample clustering procedure as a sequence of cluster splits determined by a sequence of maximization problems. We use this representation to provide a simple and intuitive formulation for the population clustering procedure. We then demonstrate that the sample procedure consistently estimates its population analog, and derive the corresponding rates of convergence. The proof conducts a careful simultaneous analysis of a collection of M-estimation problems, whose cardinality grows together with the sample size. Based on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Bayesian Methods and Mixture Models · SARS-CoV-2 detection and testing
