TL;DR
LSD-C introduces a clustering method that enforces linear separability in feature space, improving unsupervised clustering performance on image and text datasets by combining pairwise similarity, self-supervised pretraining, and data augmentation.
Contribution
The paper proposes a novel clustering algorithm that ensures linear separability of clusters in deep feature space, enhancing unsupervised learning effectiveness.
Findings
Outperforms existing methods on CIFAR 10/100, STL 10, MNIST, and Reuters 10K datasets.
Effectively combines pairwise similarity, self-supervised pretraining, and data augmentation.
Achieves significant improvements in clustering accuracy and separation quality.
Abstract
We present LSD-C, a novel method to identify clusters in an unlabeled dataset. Our algorithm first establishes pairwise connections in the feature space between the samples of the minibatch based on a similarity metric. Then it regroups in clusters the connected samples and enforces a linear separation between clusters. This is achieved by using the pairwise connections as targets together with a binary cross-entropy loss on the predictions that the associated pairs of samples belong to the same cluster. This way, the feature representation of the network will evolve such that similar samples in this feature space will belong to the same linearly separated cluster. Our method draws inspiration from recent semi-supervised learning practice and proposes to combine our clustering algorithm with self-supervised pretraining and strong data augmentation. We show that our approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
