Strong bounds for large-scale Minimum Sum-of-Squares Clustering
Anna Livia Croella, Veronica Piccialli, Antonio M. Sudoso

TL;DR
This paper introduces a new divide-and-conquer method to validate solutions for large-scale Minimum Sum-of-Squares Clustering, providing strong bounds and efficient assessment of heuristic solutions with minimal optimality gaps.
Contribution
The paper presents a novel approach combining problem decomposition and heuristics to efficiently estimate optimality gaps in large-scale MSSC problems.
Findings
Achieves optimality gaps below 3% in most large-scale instances.
Demonstrates computational efficiency for large datasets.
Provides a practical validation tool for heuristic clustering solutions.
Abstract
Clustering is a fundamental technique in data analysis and machine learning, used to group similar data points together. Among various clustering methods, the Minimum Sum-of-Squares Clustering (MSSC) is one of the most widely used. MSSC aims to minimize the total squared Euclidean distance between data points and their corresponding cluster centroids. Due to the unsupervised nature of clustering, achieving global optimality is crucial, yet computationally challenging. The complexity of finding the global solution increases exponentially with the number of data points, making exact methods impractical for large-scale datasets. Even obtaining strong lower bounds on the optimal MSSC objective value is computationally prohibitive, making it difficult to assess the quality of heuristic solutions. We address this challenge by introducing a novel method to validate heuristic MSSC solutions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Sparse and Compressive Sensing Techniques · Advanced Clustering Algorithms Research
