Clustering-Induced Generative Incomplete Image-Text Clustering (CIGIT-C)
Dongjin Guo, Xiaoming Su, Jiatai Wang, Limin Liu, Zhiyong Pei, Zhiwei, Xu

TL;DR
This paper introduces CIGIT-C, a novel clustering framework that effectively handles incomplete multi-modal image-text data by exploring latent inter- and intra-modal connections through generative adversarial networks, improving clustering performance.
Contribution
The paper proposes a new method that addresses incomplete image-text data by leveraging modality-specific encoders and adversarial generation to enhance clustering accuracy.
Findings
Outperforms existing methods on public datasets
Effectively handles missing data in multi-modal clustering
Improves clustering accuracy with latent connection exploration
Abstract
The target of image-text clustering (ITC) is to find correct clusters by integrating complementary and consistent information of multi-modalities for these heterogeneous samples. However, the majority of current studies analyse ITC on the ideal premise that the samples in every modality are complete. This presumption, however, is not always valid in real-world situations. The missing data issue degenerates the image-text feature learning performance and will finally affect the generalization abilities in ITC tasks. Although a series of methods have been proposed to address this incomplete image text clustering issue (IITC), the following problems still exist: 1) most existing methods hardly consider the distinct gap between heterogeneous feature domains. 2) For missing data, the representations generated by existing methods are rarely guaranteed to suit clustering tasks. 3) Existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
