No More Than 6ft Apart: Robust K-Means via Radius Upper Bounds
Ahmed Imtiaz Humayun, Randall Balestriero, Anastasios Kyrillidis,, Richard Baraniuk

TL;DR
This paper introduces a novel constrained k-means clustering method that enforces a maximum radius on clusters to improve robustness against dataset imbalances and sampling artifacts, using semi-definite programming.
Contribution
It presents the first k-means variant with hard radius constraints, formulated via semi-definite programming and quadratic assignment, enhancing robustness in real-world data scenarios.
Findings
Method is robust to dataset imbalances
Effective against sampling artifacts
First to incorporate hard radius constraints in k-means
Abstract
Centroid based clustering methods such as k-means, k-medoids and k-centers are heavily applied as a go-to tool in exploratory data analysis. In many cases, those methods are used to obtain representative centroids of the data manifold for visualization or summarization of a dataset. Real world datasets often contain inherent abnormalities, e.g., repeated samples and sampling bias, that manifest imbalanced clustering. We propose to remedy such a scenario by introducing a maximal radius constraint on the clusters formed by the centroids, i.e., samples from the same cluster should not be more than apart in terms of distance. We achieve this constraint by solving a semi-definite program, followed by a linear assignment problem with quadratic constraints. Through qualitative results, we show that our proposed method is robust towards dataset imbalances and sampling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Data Mining Algorithms and Applications · Artificial Intelligence in Healthcare
Methodsk-Means Clustering
