K-groups: A Generalization of K-means Clustering

Songzi Li; Maria L. Rizzo

arXiv:1711.04359·stat.ME·November 15, 2017·1 cites

K-groups: A Generalization of K-means Clustering

Songzi Li, Maria L. Rizzo

PDF

Open Access 1 Repo

TL;DR

This paper introduces k-groups, a distribution-based clustering method using energy distance, which generalizes k-means and performs better on skewed, heavy-tailed, or high-dimensional data.

Contribution

The paper proposes a novel class of clustering algorithms based on energy distance, extending k-means to handle non-spherical, skewed, and heavy-tailed distributions.

Findings

01

k-groups performs as well as k-means on well-separated, normal data.

02

k-groups outperforms k-means on skewed, heavy-tailed, and high-dimensional data.

03

k-groups by first variation is consistent as dimension increases.

Abstract

We propose a new class of distribution-based clustering algorithms, called k-groups, based on energy distance between samples. The energy distance clustering criterion assigns observations to clusters according to a multi-sample energy statistic that measures the distance between distributions. The energy distance determines a consistent test for equality of distributions, and it is based on a population distance that characterizes equality of distributions. The k-groups procedure therefore generalizes the k-means method, which separates clusters that have different means. We propose two k-groups algorithms: k-groups by first variation; and k-groups by second variation. The implementation of k-groups is partly based on Hartigan and Wong's algorithm for k-means. The algorithm is generalized from moving one point on each iteration (first variation) to moving $m$ $(m > 1)$ points. For…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mariarizzo/kgroups
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Bayesian Methods and Mixture Models · Data Mining Algorithms and Applications