Semantic-Enhanced Image Clustering
Shaotian Cai, Liping Qiu, Xiaojun Chen, Qin Zhang, Longteng Chen

TL;DR
This paper introduces SIC, a novel image clustering method leveraging visual-language pre-training to distinguish semantically different images with only known cluster count, improving clustering accuracy.
Contribution
The paper proposes a new clustering approach using CLIP, mapping images to semantic space, generating pseudo-labels, and applying self-supervised consistency learning, with theoretical convergence and risk analysis.
Findings
Outperforms existing methods on five benchmark datasets.
Converges at a sublinear speed with theoretical guarantees.
Reduces expected risk by enhancing neighborhood consistency.
Abstract
Image clustering is an important and open-challenging task in computer vision. Although many methods have been proposed to solve the image clustering task, they only explore images and uncover clusters according to the image features, thus being unable to distinguish visually similar but semantically different images. In this paper, we propose to investigate the task of image clustering with the help of a visual-language pre-training model. Different from the zero-shot setting, in which the class names are known, we only know the number of clusters in this setting. Therefore, how to map images to a proper semantic space and how to cluster images from both image and semantic spaces are two key problems. To solve the above problems, we propose a novel image clustering method guided by the visual-language pre-training model CLIP, named \textbf{Semantic-Enhanced Image Clustering (SIC)}. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · COVID-19 diagnosis using AI
MethodsContrastive Language-Image Pre-training
