Fixed-sized clusters $k$-Means

Mikko I. Malinen; Pasi Fr\"anti

arXiv:2501.16113·cs.LG·January 28, 2025

Fixed-sized clusters $k$-Means

Mikko I. Malinen, Pasi Fr\"anti

PDF

Open Access

TL;DR

This paper introduces a $k$-means clustering algorithm that optimizes mean square error for fixed cluster sizes, enabling balanced clustering of large datasets with improved assignment efficiency.

Contribution

The paper proposes a $k$-means algorithm with fixed cluster sizes using the Hungarian algorithm for assignment, improving scalability for large datasets.

Findings

01

Handles datasets with over 5000 points efficiently.

02

Optimizes mean square error for fixed cluster sizes.

03

Enables balanced clustering with $O(n^3)$ assignment complexity.

Abstract

We present a $k$ -means-based clustering algorithm, which optimizes the mean square error, for given cluster sizes. A straightforward application is balanced clustering, where the sizes of each cluster are equal. In the $k$ -means assignment phase, the algorithm solves an assignment problem using the Hungarian algorithm. This makes the assignment phase time complexity $O (n^{3})$ . This enables clustering of datasets of size more than 5000 points.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research