MIX'EM: Unsupervised Image Classification using a Mixture of Embeddings
Ali Varamesh, Tinne Tuytelaars

TL;DR
MIX'EM introduces an unsupervised image classification method that combines a mixture of embeddings with contrastive learning to produce high-quality, semantically meaningful clusters without labels.
Contribution
The paper proposes a novel mixture of embeddings module integrated into contrastive learning, enabling effective unsupervised image classification with state-of-the-art results.
Findings
Achieved 78% accuracy on STL10
Achieved 82% accuracy on CIFAR10
Achieved 44% accuracy on CIFAR100-20
Abstract
We present MIX'EM, a novel solution for unsupervised image classification. MIX'EM generates representations that by themselves are sufficient to drive a general-purpose clustering algorithm to deliver high-quality classification. This is accomplished by building a mixture of embeddings module into a contrastive visual representation learning framework in order to disentangle representations at the category level. It first generates a set of embedding and mixing coefficients from a given visual representation, and then combines them into a single embedding. We introduce three techniques to successfully train MIX'EM and avoid degenerate solutions; (i) diversify mixture components by maximizing entropy, (ii) minimize instance conditioned component entropy to enforce a clustered embedding space, and (iii) use an associative embedding loss to enforce semantic separability. By applying (i)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
