Quantile-based clustering

Christian Hennig; Cinzia Viroli; Laura Anderlucci

arXiv:1806.10403·stat.ME·November 12, 2019

Quantile-based clustering

Christian Hennig, Cinzia Viroli, Laura Anderlucci

PDF

TL;DR

The paper introduces $K$-quantiles clustering, a flexible, scalable nonparametric method that handles skewness and high-dimensional data, with proven consistency and competitive performance in simulations and real datasets.

Contribution

It presents a novel $K$-quantiles clustering algorithm that is simple, efficient, and adaptable to skewed and high-dimensional data, with theoretical guarantees.

Findings

01

Proven consistency of $K$-quantiles clustering.

02

Comparable or superior performance in simulations.

03

Effective application to high-dimensional microarray data.

Abstract

A new cluster analysis method, $K$ -quantiles clustering, is introduced. $K$ -quantiles clustering can be computed by a simple greedy algorithm in the style of the classical Lloyd's algorithm for $K$ -means. It can be applied to large and high-dimensional datasets. It allows for within-cluster skewness and internal variable scaling based on within-cluster variation. Different versions allow for different levels of parsimony and computational efficiency. Although $K$ -quantiles clustering is conceived as nonparametric, it can be connected to a fixed partition model of generalized asymmetric Laplace-distributions. The consistency of $K$ -quantiles clustering is proved, and it is shown that $K$ -quantiles clusters correspond to well separated mixture components in a nonparametric mixture. In a simulation, $K$ -quantiles clustering is compared with a number of popular clustering methods with good…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.