Fixed-sized clusters $k$-Means
Mikko I. Malinen, Pasi Fr\"anti

TL;DR
This paper introduces a $k$-means clustering algorithm that optimizes mean square error for fixed cluster sizes, enabling balanced clustering of large datasets with improved assignment efficiency.
Contribution
The paper proposes a $k$-means algorithm with fixed cluster sizes using the Hungarian algorithm for assignment, improving scalability for large datasets.
Findings
Handles datasets with over 5000 points efficiently.
Optimizes mean square error for fixed cluster sizes.
Enables balanced clustering with $O(n^3)$ assignment complexity.
Abstract
We present a -means-based clustering algorithm, which optimizes the mean square error, for given cluster sizes. A straightforward application is balanced clustering, where the sizes of each cluster are equal. In the -means assignment phase, the algorithm solves an assignment problem using the Hungarian algorithm. This makes the assignment phase time complexity . This enables clustering of datasets of size more than 5000 points.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research
