Log-Time K-Means Clustering for 1D Data: Novel Approaches with Proof and Implementation
Jake Hyun

TL;DR
This paper presents optimized algorithms for 1D k-means clustering that leverage data structure for significant speedups, demonstrated through benchmarks and practical applications, bridging theory and implementation.
Contribution
It introduces novel, efficient algorithms for 1D k-means clustering using sorted data, prefix sums, and binary search, with proven complexity improvements and open-source implementation.
Findings
Achieves over 4500x speedup over scikit-learn on large datasets.
Attains 300x speedup in LLM quantization tasks.
Maintains clustering quality with improved computational efficiency.
Abstract
Clustering is a key task in machine learning, with -means being widely used for its simplicity and effectiveness. While 1D clustering is common, existing methods often fail to exploit the structure of 1D data, leading to inefficiencies. This thesis introduces optimized algorithms for -means++ initialization and Lloyd's algorithm, leveraging sorted data, prefix sums, and binary search for improved computational performance. The main contributions are: (1) an optimized -cluster algorithm achieving complexity for greedy -means++ initialization and for Lloyd's algorithm, where is the number of greedy -means++ local trials, and is the number of Lloyd's algorithm iterations, and (2) a binary search-based two-cluster algorithm, achieving runtime with deterministic convergence to a Lloyd's algorithm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Face and Expression Recognition · Data Management and Algorithms
