Regularized K-means through hard-thresholding

Jakob Raymaekers; Ruben H. Zamar

arXiv:2010.00950·stat.ML·October 5, 2020

Regularized K-means through hard-thresholding

Jakob Raymaekers, Ruben H. Zamar

PDF

Open Access

TL;DR

This paper introduces HT K-means, a regularized clustering method using an $ ext{l}_0$ penalty to promote sparsity, with theoretical analysis, simulation comparisons, and real data applications.

Contribution

It proposes a novel HT K-means algorithm with an $ ext{l}_0$ penalty, and compares various penalization strategies through simulations and real data analysis.

Findings

01

HT K-means performs favorably compared to existing methods.

02

Different tuning parameter selection techniques are evaluated.

03

The method provides insightful visualizations for datasets.

Abstract

We study a framework of regularized $K$ -means methods based on direct penalization of the size of the cluster centers. Different penalization strategies are considered and compared through simulation and theoretical analysis. Based on the results, we propose HT $K$ -means, which uses an $ℓ_{0}$ penalty to induce sparsity in the variables. Different techniques for selecting the tuning parameter are discussed and compared. The proposed method stacks up favorably with the most popular regularized $K$ -means methods in an extensive simulation study. Finally, HT $K$ -means is applied to several real data examples. Graphical displays are presented and used in these examples to gain more insight into the datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Face and Expression Recognition · Statistical Methods and Inference