COBRA: A Fast and Simple Method for Active Clustering with Pairwise Constraints
Toon Van Craenendonck, Sebastijan Dumancic, Hendrik Blockeel

TL;DR
COBRA is a fast, active clustering method that over-clusters data with K-means and then merges clusters using pairwise constraints, minimizing queries and outperforming existing methods in quality and speed.
Contribution
Introduces COBRA, a novel active clustering algorithm that efficiently merges over-clusters with minimal pairwise constraints, leveraging transitivity and entailment.
Findings
Outperforms state-of-the-art in clustering quality
Faster runtime compared to existing methods
Does not require pre-specifying number of clusters
Abstract
Clustering is inherently ill-posed: there often exist multiple valid clusterings of a single dataset, and without any additional information a clustering system has no way of knowing which clustering it should produce. This motivates the use of constraints in clustering, as they allow users to communicate their interests to the clustering system. Active constraint-based clustering algorithms select the most useful constraints to query, aiming to produce a good clustering using as few constraints as possible. We propose COBRA, an active method that first over-clusters the data by running K-means with a that is intended to be too large, and subsequently merges the resulting small clusters into larger ones based on pairwise constraints. In its merging step, COBRA is able to keep the number of pairwise queries low by maximally exploiting constraint transitivity and entailment. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
