A Global Optimization Algorithm for K-Center Clustering of One Billion Samples
Jiayang Ren, Ningning You, Kaixun Hua, Chaojie Ji, Yankai Cao

TL;DR
This paper introduces a scalable global optimization algorithm for K-center clustering that guarantees finding the optimal solution for extremely large datasets within practical time frames, outperforming heuristic methods.
Contribution
A novel reduced-space branch and bound algorithm with a two-stage decomposable lower bound and acceleration techniques for large-scale K-center clustering.
Findings
Solves K-center problems with up to one billion samples within hours.
Achieves 25.8% average reduction in objective function compared to heuristics.
Demonstrates effectiveness on synthetic and real-world datasets.
Abstract
This paper presents a practical global optimization algorithm for the K-center clustering problem, which aims to select K samples as the cluster centers to minimize the maximum within-cluster distance. This algorithm is based on a reduced-space branch and bound scheme and guarantees convergence to the global optimum in a finite number of steps by only branching on the regions of centers. To improve efficiency, we have designed a two-stage decomposable lower bound, the solution of which can be derived in a closed form. In addition, we also propose several acceleration techniques to narrow down the region of centers, including bounds tightening, sample reduction, and parallelization. Extensive studies on synthetic and real-world datasets have demonstrated that our algorithm can solve the K-center problems to global optimal within 4 hours for ten million samples in the serial mode and one…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Face and Expression Recognition · Bayesian Methods and Mixture Models
