Improve Contrastive Clustering Performance by Multiple Fusing-Augmenting ViT Blocks
Cheng Wang, Shuisheng Zhou, Fengjiao Peng, Jin Sheng, Feng Ye, Yinli Dong

TL;DR
This paper introduces MFAVBs, a novel ViT-based architecture that explicitly fuses features from positive pairs using multiple augmentations, significantly improving contrastive clustering performance on various datasets.
Contribution
The paper proposes a new multiple fusing-augmenting ViT blocks (MFAVBs) architecture that enhances contrastive clustering by explicitly fusing features from positive pairs and integrating CLIP features.
Findings
MFAVBs outperform state-of-the-art methods on seven datasets.
Explicit feature fusion improves clustering accuracy.
Using CLIP features enhances model discriminability.
Abstract
In the field of image clustering, the widely used contrastive learning networks improve clustering performance by maximizing the similarity between positive pairs and the dissimilarity of negative pairs of the inputs. Extant contrastive learning networks, whose two encoders often implicitly interact with each other by parameter sharing or momentum updating, may not fully exploit the complementarity and similarity of the positive pairs to extract clustering features from input data. To explicitly fuse the learned features of positive pairs, we design a novel multiple fusing-augmenting ViT blocks (MFAVBs) based on the excellent feature learning ability of Vision Transformers (ViT). Firstly, two preprocessed augmentions as positive pairs are separately fed into two shared-weight ViTs, then their output features are fused to input into a larger ViT. Secondly, the learned features are split…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Remote-Sensing Image Classification
