Exploring the Limits of Large Scale Pre-training

Samira Abnar; Mostafa Dehghani; Behnam Neyshabur; Hanie; Sedghi

arXiv:2110.02095·cs.LG·October 6, 2021·35 cites

Exploring the Limits of Large Scale Pre-training

Samira Abnar, Mostafa Dehghani, Behnam Neyshabur, Hanie, Sedghi

PDF

Open Access 1 Models 1 Video

TL;DR

This paper systematically investigates the limits of large-scale pre-training in vision models, revealing that improvements in upstream accuracy tend to saturate and do not always translate to better downstream performance, which is influenced by representation evolution.

Contribution

The study provides a comprehensive analysis of the saturation phenomenon in large-scale pre-training and introduces a model capturing the nonlinear relationship between upstream and downstream performance.

Findings

01

Downstream performance saturates with increased upstream accuracy.

02

Representation evolution explains performance saturation.

03

Better downstream results may require sacrificing upstream accuracy.

Abstract

Recent developments in large-scale machine learning suggest that by scaling up data, model size and training time properly, one might observe that improvements in pre-training would transfer favorably to most downstream tasks. In this work, we systematically study this phenomena and establish that, as we increase the upstream accuracy, the performance of downstream tasks saturates. In particular, we investigate more than 4800 experiments on Vision Transformers, MLP-Mixers and ResNets with number of parameters ranging from ten million to ten billion, trained on the largest scale of available image data (JFT, ImageNet21K) and evaluated on more than 20 downstream image recognition tasks. We propose a model for downstream performance that reflects the saturation phenomena and captures the nonlinear relationship in performance of upstream and downstream tasks. Delving deeper to understand…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
fcxfcx/owlv2
model· ♡ 1
♡ 1

Videos

Exploring the Limits of Large Scale Pre-training· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Machine Learning and Algorithms