Rethinking Pre-training and Self-training
Barret Zoph, Golnaz Ghiasi, Tsung-Yi Lin, Yin Cui, Hanxiao Liu, Ekin, D. Cubuk, Quoc V. Le

TL;DR
This paper compares pre-training and self-training in computer vision, showing self-training's robustness and advantages over pre-training, especially with stronger data augmentation and varying data regimes, leading to improved detection and segmentation results.
Contribution
The study demonstrates the generality and effectiveness of self-training over traditional pre-training across multiple datasets and conditions, highlighting its potential as a superior training paradigm.
Findings
Self-training consistently improves performance across datasets.
Pre-training benefits diminish with stronger data augmentation.
Self-training outperforms pre-training in low-data and high-data regimes.
Abstract
Pre-training is a dominant paradigm in computer vision. For example, supervised ImageNet pre-training is commonly used to initialize the backbones of object detection and segmentation models. He et al., however, show a surprising result that ImageNet pre-training has limited impact on COCO object detection. Here we investigate self-training as another method to utilize additional data on the same setup and contrast it against ImageNet pre-training. Our study reveals the generality and flexibility of self-training with three additional insights: 1) stronger data augmentation and more labeled data further diminish the value of pre-training, 2) unlike pre-training, self-training is always helpful when using stronger data augmentation, in both low-data and high-data regimes, and 3) in the case that pre-training is helpful, self-training improves upon pre-training. For example, on the COCO…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHuman Resource Development and Performance Evaluation
MethodsAverage Pooling · Neural Architecture Search · Global Average Pooling · NAS-FPN · Entropy Regularization · Residual Connection · Softmax · *Communicated@Fast*How Do I Communicate to Expedia? · Bottleneck Residual Block · Batch Normalization
