Semi-Supervised Clustering with Inaccurate Pairwise Annotations
Daniel Gribel, Michel Gendreau, Thibaut Vidal

TL;DR
This paper introduces a semi-supervised clustering model that effectively utilizes pairwise must-link and cannot-link annotations, accounting for potential inaccuracies, to improve clustering performance especially in real-world noisy scenarios.
Contribution
It presents a generative probabilistic model that incorporates annotation inaccuracies and prior knowledge of expert accuracy, enhancing clustering robustness with weak supervision.
Findings
Relational information improves clustering accuracy even with noisy annotations.
The model detects meaningful groups in real-world datasets beyond distribution assumptions.
Incorporating prior knowledge of annotation accuracy benefits clustering performance.
Abstract
Pairwise relational information is a useful way of providing partial supervision in domains where class labels are difficult to acquire. This work presents a clustering model that incorporates pairwise annotations in the form of must-link and cannot-link relations and considers possible annotation inaccuracies (i.e., a common setting when experts provide pairwise supervision). We propose a generative model that assumes Gaussian-distributed data samples along with must-link and cannot-link relations generated by stochastic block models. We adopt a maximum-likelihood approach and demonstrate that, even when supervision is weak and inaccurate, accounting for relational information significantly improves clustering performance. Relational information also helps to detect meaningful groups in real-world datasets that do not fit the original data-distribution assumptions. Additionally, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Advanced Clustering Algorithms Research · Complex Network Analysis Techniques
