Kempe Swap K-Means: A Scalable Near-Optimal Solution for Semi-Supervised Clustering

Yuxuan Ren; Shijie Deng

arXiv:2603.27417·cs.LG·March 31, 2026

Kempe Swap K-Means: A Scalable Near-Optimal Solution for Semi-Supervised Clustering

Yuxuan Ren, Shijie Deng

PDF

TL;DR

Kempe Swap K-Means is a scalable heuristic for semi-supervised clustering that improves accuracy and efficiency by using Kempe chain swaps and controlled perturbations.

Contribution

Introduces Kempe Swap K-Means, a novel dual-phase heuristic algorithm with perturbation strategies for constrained clustering under must-link and cannot-link constraints.

Findings

01

Achieves near-optimal clustering results.

02

Outperforms state-of-the-art methods in accuracy.

03

Maintains high computational efficiency and scalability.

Abstract

This paper presents a novel centroid-based heuristic algorithm, termed Kempe Swap K-Means, for constrained clustering under rigid must-link (ML) and cannot-link (CL) constraints. The algorithm employs a dual-phase iterative process: an assignment step that utilizes Kempe chain swaps to refine current clustering in the constrained solution space and a centroid update step that computes optimal cluster centroids. To enhance global search capabilities and avoid local optima, the framework incorporates controlled perturbations during the update phase. Empirical evaluations demonstrate that the proposed method achieves near-optimal partitions while maintaining high computational efficiency and scalability. The results indicate that Kempe Swap K-Means consistently outperforms state-of-the-art benchmarks in both clustering accuracy and algorithmic efficiency for large-scale datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.