Compressive K-means

Nicolas Keriven (PANAMA); Nicolas Tremblay (GIPSA-CICS); Yann; Traonmilin (PANAMA); R\'emi Gribonval (PANAMA)

arXiv:1610.08738·cs.LG·February 13, 2017

Compressive K-means

Nicolas Keriven (PANAMA), Nicolas Tremblay (GIPSA-CICS), Yann, Traonmilin (PANAMA), R\'emi Gribonval (PANAMA)

PDF

Open Access

TL;DR

The paper introduces Compressive K-means (CKM), a scalable clustering method that estimates centers from a compressed data sketch, significantly reducing computation time while maintaining accuracy on large datasets.

Contribution

CKM provides a novel, memory-efficient clustering approach that operates on compressed data sketches, reducing dependence on dataset size and initialization sensitivity.

Findings

01

CKM performs similarly to Lloyd-Max with a sketch size proportional to the number of clusters and ambient dimension.

02

CKM is two orders of magnitude faster than multiple Lloyd-Max runs on large datasets.

03

CKM achieves lower classification errors on handwritten digit data.

Abstract

The Lloyd-Max algorithm is a classical approach to perform K-means clustering. Unfortunately, its cost becomes prohibitive as the training dataset grows large. We propose a compressive version of K-means (CKM), that estimates cluster centers from a sketch, i.e. from a drastically compressed representation of the training dataset. We demonstrate empirically that CKM performs similarly to Lloyd-Max, for a sketch size proportional to the number of cen-troids times the ambient dimension, and independent of the size of the original dataset. Given the sketch, the computational complexity of CKM is also independent of the size of the dataset. Unlike Lloyd-Max which requires several replicates, we further demonstrate that CKM is almost insensitive to initialization. For a large dataset of 10^7 data points, we show that CKM can run two orders of magnitude faster than five replicates of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace and Expression Recognition · Advanced Clustering Algorithms Research · Anomaly Detection Techniques and Applications