Clustering by Maximizing Mutual Information Across Views
Kien Do, Truyen Tran, Svetha Venkatesh

TL;DR
This paper introduces a new end-to-end image clustering framework that jointly learns representations and clusters by maximizing mutual information across views, leading to significant performance improvements.
Contribution
It proposes a dual-head model with a novel critic function for contrastive learning, enhancing clustering accuracy over state-of-the-art methods.
Findings
Outperforms existing methods by 5-7% in accuracy on multiple datasets.
Effective joint learning of representations and clustering in a single model.
Two-stage variant also shows superior results on challenging datasets.
Abstract
We propose a novel framework for image clustering that incorporates joint representation learning and clustering. Our method consists of two heads that share the same backbone network - a "representation learning" head and a "clustering" head. The "representation learning" head captures fine-grained patterns of objects at the instance level which serve as clues for the "clustering" head to extract coarse-grain information that separates objects into clusters. The whole model is trained in an end-to-end manner by minimizing the weighted sum of two sample-oriented contrastive losses applied to the outputs of the two heads. To ensure that the contrastive loss corresponding to the "clustering" head is optimal, we introduce a novel critic function called "log-of-dot-product". Extensive experimental results demonstrate that our method significantly outperforms state-of-the-art single-stage…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
