TL;DR
This paper introduces a new unsupervised pre-training method for image features using large-scale uncurated data, achieving state-of-the-art results and improving supervised classification accuracy.
Contribution
The paper presents a novel self-supervised clustering approach that effectively leverages massive uncurated datasets for visual feature learning.
Findings
Achieved state-of-the-art results on standard benchmarks for unsupervised methods.
Pre-training with our method improves supervised ImageNet classification accuracy.
Validated the effectiveness of unsupervised learning on 96 million uncurated images.
Abstract
Pre-training general-purpose visual features with convolutional neural networks without relying on annotations is a challenging and important task. Most recent efforts in unsupervised feature learning have focused on either small or highly curated datasets like ImageNet, whereas using uncurated raw datasets was found to decrease the feature quality when evaluated on a transfer task. Our goal is to bridge the performance gap between unsupervised methods trained on curated data, which are costly to obtain, and massive raw datasets that are easily available. To that effect, we propose a new unsupervised approach which leverages self-supervision and clustering to capture complementary statistics from large-scale data. We validate our approach on 96 million images from YFCC100M, achieving state-of-the-art results among unsupervised methods on standard benchmarks, which confirms the potential…
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19
Figure 20
Figure 21
Figure 22
Figure 23
Figure 24
Figure 25
Figure 26
Figure 27
Figure 28
Figure 29
Figure 30
Figure 3
Figure 32
Figure 33
Figure 2
Figure 35Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
**Leveraging Large-Scale Uncurated Data for Unsupervised Learning of Visual Features **
**Mathilde Caron, Piotr Bojanowski, Armand Joulin and Julien Mairal
**
•
Goal
Learning general-purpose visual features with convnets on large-scale unsupervised and uncurated datasets.
•
Motivation
–
bridge the performance gap between unsupervised methods trained on curated data, which are costly to obtain, and massive raw datasets that are easily available;
–
new unsupervised approach which leverages self-supervision and clustering to capture complementary statistics from large-scale data.
•
Method
Our approah, iterates between:
–
hierarchical clustering of the features;
–
updating convnet weights by predicting both rotation angle and cluster assignment in a single hierarchical loss.
•
Results
Features pre-trained on M images from YFCC100M with state-of-the-art performance on standard evaluation benchmarks with VGG-.
**Overview
**
**Illustration of our approach
**
•
A large set of unlabelled images , in .
•
is the convnet mapping (with the set of corresponding parameters).
•
We partition the target labels into a -level hierarchy:
Super-classes: the super-class assignment vector in of the image ;
Sub-classes: partitioning within each super-class. is the vector in of the assignment into sub-classes for an image belonging to super-class .
•
Parameters of linear classifiers and are learned by minimizing:
\frac{1}{N}\sum_{n=1}^{N}\left[\ell\big{(}Vf_{\theta}(x_{n}),y_{n}\big{)}{+}\sum_{s=1}^{S}y_{ns}\ell\left(W_{s}f_{\theta}(x_{n}),z^{s}_{n}\right)\right],
where is the negative log-softmax function.
**Method
**
Classif.
Detect.
Method Data
fc68 all
fc68 all
ImageNet labels ImageNet
Unsupervised on curated data
Larsson et al. [larsson2017colorization] ImageNet + Places
–
Doersh et al. [doersch2015unsupervised] ImageNet
Caron et al. [caron2018deep] ImageNet
Unsupervised on uncurated data
Mahendran et al. [mahendran2018cross] YFCC100M videos
–
– –
Wang and Gupta [wang2015unsupervised] Youtube8M
– –
–
Wang et al. [wang2017transitive] Youtube9M
Our method YFCC100M
**Transfer learning to Pascal VOC 2007
**
We train logistic regressions on top of frozen convolutional layers at different depths.
**Comparing with methods on YFCC100M
**
We report validation mAP on Pascal VOC classification task (fc68 setting).
**Amounts of images and clusters
**
We display random images for clusters pure for a certain metadata. The bottom row depicts clusters that are pure for GPS coordinates but unpure for user IDs.
tag: cat tag: elephantparadelondon tag: always device: CanoScan
GPS: (, ) GPS: (, ) GPS: (, ) GPS: (, )
**Clustering quality
**
**References
**
