Dual-Level Cross-Modal Contrastive Clustering
Haixin Zhang, Yongjun Li, Dong Huang

TL;DR
This paper introduces DXMC, a novel clustering framework that leverages external textual information and dual-level cross-modal contrastive learning to improve image clustering accuracy by integrating visual and textual semantic representations.
Contribution
The paper proposes a new dual-level cross-modal contrastive clustering method that effectively utilizes external textual information and dual-level contrastive learning for improved image clustering.
Findings
Outperforms existing methods on five benchmark datasets
Effectively integrates visual and textual modalities for clustering
Demonstrates significant improvements in clustering accuracy
Abstract
Image clustering, which involves grouping images into different clusters without labels, is a key task in unsupervised learning. Although previous deep clustering methods have achieved remarkable results, they only explore the intrinsic information of the image itself but overlook external supervision knowledge to improve the semantic understanding of images. Recently, visual-language pre-trained model on large-scale datasets have been used in various downstream tasks and have achieved great results. However, there is a gap between visual representation learning and textual semantic learning, and how to properly utilize the representation of two different modalities for clustering is still a big challenge. To tackle the challenges, we propose a novel image clustering framwork, named Dual-level Cross-Modal Contrastive Clustering (DXMC). Firstly, external textual information is introduced…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research
MethodsContrastive Learning
