Seeding K-Means using Method of Moments

Sayantan Dasgupta

arXiv:1511.05933·cs.LG·November 1, 2016

Seeding K-Means using Method of Moments

Sayantan Dasgupta

PDF

Open Access

TL;DR

This paper introduces a novel seeding method for K-means clustering using higher order moments, which reduces initialization passes to one and guarantees a cost close to optimal, improving efficiency for large datasets.

Contribution

The paper presents a new seeding technique based on higher order moments that requires only one dataset pass and offers provable near-optimal clustering cost.

Findings

01

Requires only one pass through data for seeding

02

Guarantees final cost within O(√K) of optimal

03

Outperforms existing seeding methods on benchmarks

Abstract

K-means is one of the most widely used algorithms for clustering in Data Mining applications, which attempts to minimize the sum of the square of the Euclidean distance of the points in the clusters from the respective means of the clusters. However, K-means suffers from local minima problem and is not guaranteed to converge to the optimal cost. K-means++ tries to address the problem by seeding the means using a distance-based sampling scheme. However, seeding the means in K-means++ needs $O (K)$ sequential passes through the entire dataset, and this can be very costly for large datasets. Here we propose a method of seeding the initial means based on factorizations of higher order moments for bounded data. Our method takes $O (1)$ passes through the entire dataset to extract the initial set of means, and its final cost can be proven to be within $O (K)$ of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace and Expression Recognition · Advanced Clustering Algorithms Research · Data Management and Algorithms