TL;DR
This paper introduces HG-means, a scalable hybrid genetic algorithm that significantly improves the quality of solutions for the minimum sum-of-squares clustering problem, especially in high-dimensional datasets.
Contribution
The paper presents HG-means, a novel hybrid genetic algorithm that outperforms existing methods in solution quality for MSSC by integrating K-means with problem-specific genetic operators.
Findings
HG-means outperforms recent state-of-the-art algorithms in solution quality.
The algorithm produces clusters closer to ground truth in high-dimensional Gaussian datasets.
Scalable and effective for large, complex clustering problems.
Abstract
Minimum sum-of-squares clustering (MSSC) is a widely used clustering model, of which the popular K-means algorithm constitutes a local minimizer. It is well known that the solutions of K-means can be arbitrarily distant from the true MSSC global optimum, and dozens of alternative heuristics have been proposed for this problem. However, no other algorithm has been predominantly adopted in the literature. This may be related to differences of computational effort, or to the assumption that a near-optimal solution of the MSSC has only a marginal impact on clustering validity. In this article, we dispute this belief. We introduce an efficient population-based metaheuristic that uses K-means as a local search in combination with problem-tailored crossover, mutation, and diversification operators. This algorithm can be interpreted as a multi-start K-means, in which the initial center…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
