Rethinking Pre-training and Self-training

Barret Zoph; Golnaz Ghiasi; Tsung-Yi Lin; Yin Cui; Hanxiao Liu; Ekin; D. Cubuk; Quoc V. Le

arXiv:2006.06882·cs.CV·November 17, 2020·366 cites

Rethinking Pre-training and Self-training

Barret Zoph, Golnaz Ghiasi, Tsung-Yi Lin, Yin Cui, Hanxiao Liu, Ekin, D. Cubuk, Quoc V. Le

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper compares pre-training and self-training in computer vision, showing self-training's robustness and advantages over pre-training, especially with stronger data augmentation and varying data regimes, leading to improved detection and segmentation results.

Contribution

The study demonstrates the generality and effectiveness of self-training over traditional pre-training across multiple datasets and conditions, highlighting its potential as a superior training paradigm.

Findings

01

Self-training consistently improves performance across datasets.

02

Pre-training benefits diminish with stronger data augmentation.

03

Self-training outperforms pre-training in low-data and high-data regimes.

Abstract

Pre-training is a dominant paradigm in computer vision. For example, supervised ImageNet pre-training is commonly used to initialize the backbones of object detection and segmentation models. He et al., however, show a surprising result that ImageNet pre-training has limited impact on COCO object detection. Here we investigate self-training as another method to utilize additional data on the same setup and contrast it against ImageNet pre-training. Our study reveals the generality and flexibility of self-training with three additional insights: 1) stronger data augmentation and more labeled data further diminish the value of pre-training, 2) unlike pre-training, self-training is always helpful when using stronger data augmentation, in both low-data and high-data regimes, and 3) in the case that pre-training is helpful, self-training improves upon pre-training. For example, on the COCO…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Rethinking Pre-training and Self-training· slideslive

Taxonomy

TopicsHuman Resource Development and Performance Evaluation

MethodsAverage Pooling · Neural Architecture Search · Global Average Pooling · NAS-FPN · Entropy Regularization · Residual Connection · Softmax · *Communicated@Fast*How Do I Communicate to Expedia? · Bottleneck Residual Block · Batch Normalization