An Exact Algorithm for Semi-supervised Minimum Sum-of-Squares Clustering
Veronica Piccialli, Anna Russo Russo, Antonio M. Sudoso

TL;DR
This paper introduces a novel branch-and-cut algorithm for semi-supervised minimum sum-of-squares clustering that effectively handles real-world datasets with background constraints, significantly expanding the size of solvable instances.
Contribution
The paper presents the first efficient global optimization algorithm for semi-supervised MSSC capable of solving larger, real-world instances with background knowledge constraints.
Findings
Successfully solves instances with up to 800 data points
Handles various combinations of must-link and cannot-link constraints
Outperforms previous exact algorithms in problem size capacity
Abstract
The minimum sum-of-squares clustering (MSSC), or k-means type clustering, is traditionally considered an unsupervised learning task. In recent years, the use of background knowledge to improve the cluster quality and promote interpretability of the clustering process has become a hot research topic at the intersection of mathematical optimization and machine learning research. The problem of taking advantage of background information in data clustering is called semi-supervised or constrained clustering. In this paper, we present a branch-and-cut algorithm for semi-supervised MSSC, where background knowledge is incorporated as pairwise must-link and cannot-link constraints. For the lower bound procedure, we solve the semidefinite programming relaxation of the MSSC discrete optimization model, and we use a cutting-plane procedure for strengthening the bound. For the upper bound, instead,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Face and Expression Recognition · Stochastic Gradient Optimization Techniques
