TL;DR
This paper introduces a novel algorithm combining column generation with dynamic constraint aggregation to efficiently solve large-scale minimum sum-of-squares clustering problems, outperforming existing methods.
Contribution
It presents a new exact solution approach for MSSC using DCA within a CG framework, with detailed ablation studies and superior performance.
Findings
Significantly faster solution times compared to state-of-the-art methods.
Effective reduction in the number of constraints through DCA.
Demonstrated scalability to large MSSC instances.
Abstract
The minimum sum-of-squares clustering problem (MSSC), also known as -means clustering, refers to the problem of partitioning data points into clusters, with the objective of minimizing the total sum of squared Euclidean distances between each point and the center of its assigned cluster. We propose an efficient algorithm for solving large-scale MSSC instances, which combines column generation (CG) with dynamic constraint aggregation (DCA) to effectively reduce the number of constraints considered in the CG master problem. DCA was originally conceived to reduce degeneracy in set partitioning problems by utilizing an aggregated restricted master problem obtained from a partition of the set partitioning constraints into disjoint clusters. In this work, we explore the use of DCA within a CG algorithm for MSSC exact solution. Our method is fine-tuned by a series of ablation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSparse Evolutionary Training
