ClusterFit: Improving Generalization of Visual Representations
Xueting Yan, Ishan Misra, Abhinav Gupta, Deepti Ghadiyaram, and Dhruv Mahajan

TL;DR
ClusterFit is a simple yet effective method that enhances the generalization of visual representations by clustering features and retraining models on pseudo-labels, reducing overfitting across various pre-training frameworks and tasks.
Contribution
The paper introduces ClusterFit, a novel clustering-based strategy that improves the robustness and transferability of pre-trained visual representations across multiple modalities and tasks.
Findings
Significant improvement in transfer learning performance across 11 datasets.
Effective reduction of overfitting to pre-training objectives.
Compatibility with various pre-training frameworks and modalities.
Abstract
Pre-training convolutional neural networks with weakly-supervised and self-supervised strategies is becoming increasingly popular for several computer vision tasks. However, due to the lack of strong discriminative signals, these learned representations may overfit to the pre-training objective (e.g., hashtag prediction) and not generalize well to downstream tasks. In this work, we present a simple strategy - ClusterFit (CF) to improve the robustness of the visual representations learned during pre-training. Given a dataset, we (a) cluster its features extracted from a pre-trained network using k-means and (b) re-train a new network from scratch on this dataset using cluster assignments as pseudo-labels. We empirically show that clustering helps reduce the pre-training task-specific information from the extracted features thereby minimizing overfitting to the same. Our approach is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
ClusterFit: Improving Generalization of Visual Representations· youtube
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Human Pose and Action Recognition
MethodsClusterFit
