k2-means for fast and accurate large scale clustering
Eirikur Agustsson, Radu Timofte, Luc Van Gool

TL;DR
k^2-means is a novel clustering algorithm that significantly accelerates large-scale clustering tasks by combining a new initialization method with an optimized assignment step, achieving faster convergence and comparable accuracy.
Contribution
It introduces k^2-means, a scalable clustering method with a new initialization and assignment strategy that reduces computational complexity for large datasets.
Findings
k^2-means is orders of magnitude faster than standard methods.
It achieves low energy solutions comparable to k-means++.
The method performs well on high-dimensional, large-cluster datasets.
Abstract
We propose k^2-means, a new clustering method which efficiently copes with large numbers of clusters and achieves low energy solutions. k^2-means builds upon the standard k-means (Lloyd's algorithm) and combines a new strategy to accelerate the convergence with a new low time complexity divisive initialization. The accelerated convergence is achieved through only looking at k_n nearest clusters and using triangle inequality bounds in the assignment step while the divisive initialization employs an optimal 2-clustering along a direction. The worst-case time complexity per iteration of our k^2-means is O(nk_nd+k^2d), where d is the dimension of the n data points and k is the number of clusters and usually n << k << k_n. Compared to k-means' O(nkd) complexity, our k^2-means complexity is significantly lower, at the expense of slightly increasing the memory complexity by O(nk_n+k^2). In our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
