An algorithm for clustering with confidence-based must-link and cannot-link constraints
Philipp Baumann, Dorit S. Hochbaum

TL;DR
This paper introduces PCCC, a flexible semi-supervised clustering algorithm that effectively incorporates confidence-based pairwise constraints using integer programming, scaling to large datasets and outperforming existing methods.
Contribution
The paper presents a novel algorithm that handles both hard and soft pairwise constraints with confidence levels, improving scalability and solution quality in semi-supervised clustering.
Findings
Scales to large datasets with up to 60,000 objects.
Outperforms state-of-the-art methods in runtime and solution quality.
Effectively incorporates confidence levels in pairwise constraints.
Abstract
We study here the semi-supervised -clustering problem where information is available on whether pairs of objects are in the same or in different clusters. This information is either available with certainty or with a limited level of confidence. We introduce the PCCC (Pairwise-Confidence-Constraints-Clustering) algorithm, which iteratively assigns objects to clusters while accounting for the information provided on the pairs of objects. Our algorithm uses integer programming for the assignment of objects which allows to include relationships as hard constraints that are guaranteed to be satisfied or as soft constraints that can be violated subject to a penalty. This flexibility distinguishes our algorithm from the state-of-the-art in which all pairwise constraints are either considered hard, or all are considered soft. We developed an enhanced multi-start approach and a model-size…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Clustering Algorithms Research · Multi-Criteria Decision Making
